Analytical Chemistry
May 1, 1996

Analytical Chemistry 1996, (68) 305A-309A
Copyright © 1996 by the American Chemical Society.

A Practical Guide to Analytical Method Validation

Doing a thorough method validation can be tedious, but the consequences of not doing it right are wasted time, money, and resources

The ability to provide timely, accurate, and reliable data is central to the role of analytical chemists and is especially true in the discovery, development, and manufacture of pharmaceuticals. Analytical data are used to screen potential drug candidates, aid in the development of drug syntheses, support formulation studies, monitor the stability of bulk pharmaceuticals and formulated products, and test final products for release. The quality of analytical data is a key factor in the success of a drug development program. The process of method development and validation has a direct impact on the quality of these data.

Although a thorough validation cannot rule out all potential problems, the process of method development and validation should address the most common ones. Examples of typical problems that can be minimized or avoided are synthesis impurities that coelute with the analyte peak in an HPLC assay; a particular type of column that no longer produces the separation needed because the supplier of the column has changed the manufacturing process; an assay method that is transferred to a second laboratory where they are unable to achieve the same detection limit; and a quality assurance audit of a validation report that finds no documentation on how the method was performed during the validation.

Problems increase as additional people, laboratories, and equipment are used to perform the method. When the method is used in the developer's laboratory, a small adjustment can usually be made to make the method work, but the flexibility to change it is lost once the method is transferred to other laboratories or used for official product testing. This is especially true in the pharmaceutical industry, where methods are submitted to regulatory agencies and changes may require formal approval before they can be implemented for official testing. The best way to minimize method problems is to perform adequate validation experiments during development.

What is method validation?
Method validation is the process of proving that an analytical method is acceptable for its intended purpose. For pharmaceutical methods, guidelines from the United States Pharmacopeia (USP) (1), International Conference on Harmonisation (ICH) (2), and the Food and Drug Administration (FDA) (3, 4) provide a framework for performing such validations. In general, methods for regulatory submission must include studies on specificity, linearity, accuracy, precision, range, detection limit, quantitation limit, and robustness.

Although there is general agreement about what type of studies should be done, there is great diversity in how they are performed (5). The literature contains diverse approaches to performing validations (as in References 6--10). This Report presents an approach to performing validation studies that encompasses much of the current literature and provides practical guidance. This approach should be viewed with the understanding that validation requirements are continually changing and vary widely, depending on the type of drug being tested, the stage of drug development, and the regulatory group that will review the drug application. For our purposes, we will discuss validation studies as they apply to chromatographic methods, although the same principles apply to other analytical techniques.

In the early stages of drug development, it is usually not necessary to perform all of the various validation studies. Many researchers focus on specificity, linearity, accuracy, and precision studies for drugs in the preclinical through Phase II (preliminary efficacy) stages. The remaining studies are performed when the drug reaches the Phase III (efficacy) stage of development and has a higher probability of becoming a marketed product.

The process of validating a method cannot be separated from the actual development of the method conditions, because the developer will not know whether the method conditions are acceptable until validation studies are performed. The development and validation of a new analytical method may therefore be an iterative process. Results of validation studies may indicate that a change in the procedure is necessary, which may then require revalidation. During each validation study, key method parameters are determined and then used for all subsequent validation steps. To minimize repetitious studies and ensure that the validation data are generated under conditions equivalent to the final procedure, we recommend the following sequence of studies.

Establish minimum criteria
The first step in the method development and validation cycle should be to set minimum requirements, which are essentially acceptance specifications for the method. A complete list of criteria should be agreed on by the developer and the end users before the method is developed so that expectations are clear.

For example, is it critical that method precision (RSD) be „ 2%? Does the method need to be accurate to within 2% of the target concentration? Is it acceptable to have only one supplier of the HPLC column used in the analysis? During the actual studies and in the final validation report, these criteria will allow clear judgment about the acceptability of the analytical method.

Examples of minimum criteria are provided throughout this article that indicate practical ways to evaluate the acceptability of data from each validation study. The statistics generated for making comparisons are similar to what analysts will generate later in the routine use of the method and therefore can serve as a tool for evaluating later questionable data. More rigorous statistical evaluation techniques are available and should be used in some instances, but these may not allow as direct a comparison for method troubleshooting during routine use.

Demonstrate specificity
For chromatographic methods, developing a separation involves demonstrating specificity, which is the ability of the method to accurately measure the analyte response in the presence of all potential sample components. The response of the analyte in test mixtures containing the analyte and all potential sample components (placebo formulation, synthesis intermediates, excipients, degradation products, process impurities, etc.) is compared with the response of a solution containing only the analyte. Other potential sample components are generated by exposing the analyte to stress conditions sufficient to degrade it to 80-90% purity. For bulk pharmaceuticals, stress conditions such as heat (50 |AoC), light (600 FC), acid (0.1 N HCl), base (0.1 N NaOH), and oxidant (3% H2O2) are typical. For formulated products, heat, light, and humidity (85%) are often used.

The resulting mixtures are then analyzed, and the analyte peak is evaluated for peak purity and resolution from the nearest eluting peak. If an alternate chromatographic column is to be allowed in the final method procedure, it should be identified during these studies. Once acceptable resolution is obtained for the analyte and potential sample components, the chromatographic parameters, such as column type, mobile-phase composition, flow rate, and detection mode, are considered set.

An example of specificity criteria for an assay method is that the analyte peak will have baseline chromatographic resolution of at least 1.5 from all other sample components. If this cannot be achieved, the unresolved components at their maximum expected levels will not affect the final assay result by more than 0.5%. An example of specificity criteria for an impurity method is that all impurity peaks that are „ 0.1% by area will have baseline chromatographic resolution from the main component peak(s) and, where practical, will have resolution from all other impurities.

Demonstrate linearity
A linearity study verifies that the sample solutions are in a concentration range where analyte response is linearly proportional to concentration. For assay methods, this study is generally performed by preparing standard solutions at five concentration levels, from 50 to 150% of the target analyte concentration. Five levels are required to allow detection of curvature in the plotted data. The standards are evaluated using the chromatographic conditions determined during the specificity studies.

Standards should be prepared and analyzed a minimum of three times. The 50 to 150% range for this study is wider than what is required by the FDA guidelines. In the final method procedure, a tighter range of three standards is generally used, such as 80, 100, and 120% of target; and in some instances, a single standard concentration is used.

Validating over a wider range provides confidence that the routine standard levels are well removed from nonlinear response concentrations, that the method covers a wide enough range to incorporate the limits of content uniformity testing, and that it allows quantitation of crude samples in support of process development. For impurity methods, linearity is determined by preparing standard solutions at five concentration levels over a range such as 0.05-2.5 wt%.

Acceptability of linearity data is often judged by examining the correlation coefficient and y-intercept of the linear regression line for the response versus concentration plot. A correlation coefficient of > 0.999 is generally considered as evidence of acceptable fit of the data to the regression line. The y-intercept should be less than a few percent of the response obtained for the analyte at the target level.

Although these are very practical ways of evaluating linearity data, they are not true measures of linearity (11, 12). These parameters, by themselves, can be misleading and should not be used without a visual examination of the response versus concentration plot. An example of how the use of correlation coefficients can be misleading can be seen in data from an HPLC method for quantitation of mannitol. This method uses an internal standard, so the data are recorded as peak area ratios (mannitol area/internal standard area). Figure 1 is a plot of mannitol peak area ratio versus mannitol concentration for standards analyzed by the method. Although the correlation coefficient of the linear regression is > 0.999 (top), the plot indicates small deviations from linearity at low and high concentrations. An alternate way of evaluating the data is to plot response factor [(peak area ratio \N y intercept)/concentration)] versus concentration (also shown in Figure 1).

Figure 1. Peak area ratio (circles) and response factor (squares) versus concentration for mannitol.

(Top) Concentration range is 5-80 mg/mL. For peak area ration line, y=0.09775 + 0.080569x and correlatin coefficiant = 0.99952. (Bottom) Concentration range is 12-28 mg/mL. For peak area ratio line, y-0.027 + 0.08625x and correlation coefficiant = 0.99965.

If an equivalent response was obtained at each concentration, the data points would form a straight line with a zero slope. The response factors plotted in Figure 1(top) vary greatly over the range and fall only within 15% of the target concentration. A second set of mannitol data, over a narrower range of concentrations, is shown in Figure 1(bottom). The response factors for all concentrations in this range are within 1.5% of the target concentration response. The near-zero slope of the response factor plot indicates that a linear response is obtained over this concentration range.

At the completion of linearity studies, the appropriate concentration range for the standards and the injection volume should be set for all subsequent studies.

An example of a linearity criteria for an assay method is that the correlation coefficient for each of three curves (five concentration levels each) will be „ 0.99 for the range 80\N120\% of the target concentration. The y-intercept will be „ 2% of the target concentration response. An alternate criteria is that a plot of response factor versus concentration will show all values within 2.5% of the target-level response factor for concentrations between 80 and 120% of the target concentration. For an impurity method, the correlation coefficient for each of three curves (five concentration levels each) will be „ 0.98 for the range 0.--1 2.5% of the main component concentration. The y-intercept will be „ 10% of the response produced for a 2.5 wt% impurity. An alternate criteria is that a plot of response factor versus concentration will show all values within 5% of the mean response factor for concentrations „ 0.5 wt% and within 10% of the mean response factor for concentrations „ 0.5 wt%.

Demonstrate accuracy
The accuracy of a method is the closeness of the measured value to the true value for the sample. Accuracy is usually determined in one of four ways. First, accuracy can be assessed by analyzing a sample of known concentration and comparing the measured value to the true value. National Institute of Standards and Technology (NIST) reference standards are often used; however, such a well-characterized sample is usually not available for new drug-related analytes. The second approach is to compare test results from the new method with results from an existing alternate method that is known to be accurate. Again, for pharmaceutical studies, such an alternate method is usually not available.K/p %The third and fourth approaches are based on the recovery of known amounts of analyte spiked into sample matrix. The third approach, which is the most widely used recovery study, is performed by spiking analyte in blank matrices. For assay methods, spiked samples are prepared in triplicate at three levels over a range of 50--150% of the target concentration. If potential impurities have been isolated, they should be added to the matrix to mimic impure samples. For impurity methods, spiked samples are prepared in triplicate at three levels over a range that covers the expected impurity content of the sample, such as 0.1--2.5 wt%. The analyte levels in the spiked samples should be determined using the same quantitation procedure as will be used in the final method procedure (i.e., same number and levels of standards, same number of sample and standard injections, etc.). The percent recovery should then be calculated.

The fourth approach is the technique of standard additions, which can also be used to determine recovery of spiked analyte. This approach is used if it is not possible to prepare a blank sample matrix without the presence of the analyte. This can occur, for example, with lyophilized material, in which the speciation in the lyophilized material is significantly different when the analyte is absent.

An example of an accuracy criteria for an assay method is that the mean recovery will be 100 + 2% at each concentration over the range of 80--120% of the target concentration. For an impurity method, the mean recovery will be within 0.1% absolute of the theoretical concentration or 10% relative, whichever is greater, for impurities in the range of 0.1--2.5 wt%.

Determine the range
The range of an analytical method is the concentration interval over which acceptable accuracy, linearity, and precision are obtained. In practice, the range is determined using data from the linearity and accuracy studies. Assuming that acceptable linearity and accuracy (recovery) results were obtained as described earlier, the only remaining factor to be evaluated is precision. This precision data should be available from the triplicate analyses of spiked samples in the accuracy study.

Figure 2 illustrates how precision may change as a function of analyte level. The %RSD values for ethanol quantitation by GC increased significantly as the concentration decreased from 1000 ppm to 10 ppm. Higher variability is expected as the analyte levels approach the detection limit for the method. The developer must judge at what concentration the imprecision becomes too great for the intended use of the method.

Figure 2. %RSD versus concentration for a GC headspace analysis of ethanol.

An example of range criteria for an assay method is that the acceptable range will be defined as the concentration interval over which linearity and accuracy are obtained per previously discussed criteria and that yields a precision of „ 3% RSD. For an impurity method, the acceptable range will be defined as the concentration interval over which linearity and accuracy are obtained per the above criteria, and that, in addition, yields a precision of „ 10% RSD.

Determine precision, Round 1
The precision of an analytical method is the amount of scatter in the results obtained from multiple analyses of a homogeneous sample. To be meaningful, the precision study must be performed using the exact sample and standard preparation procedures that will be used in the final method.

The first type of precision study is instrument precision or injection repeatability (3). A minimum of 10 injections of one sample solution is made to test the performance of the chromatographic instrument. The second type is repeatability or intra-assay precision (2). Intra-assay precision data are obtained by repeatedly analyzing, in one laboratory on one day, aliquots of a homogeneous sample, each of which has been independently prepared according to the method procedure. From these precision studies, the sample preparation procedure, the number of replicate samples to be prepared, and the number of injections required for each sample in the final method procedure will be set. Two additional types of precision studies are described later in Round 2.

An example of precision criteria for an assay method is that the instrument precision (RSD) will be „ 1% and the intra-assay precision will be „ 2%. For an impurity method, at the limit of quantitation, the instrument precision will be „ 5% and the intra-assay precision will be „ 10%.

Widen the scope
Once these validation studies are complete, the method developers should be confident in the ability of the method to provide good quantitation in their own laboratories. This result may be sufficient for many methods, especially in the early phases of drug development. The remaining studies should provide greater assurance that the method will work well in other laboratories, where different operators, instruments, and reagents are involved and where it will be used over much longer periods of time.

This is a good time to begin accumulating data for two or more system suitability criteria, which are required prior to routine use of the method to ensure that it is performing appropriately. Typically, the process involves making five injections of a standard solution and evaluating several chromatographic parameters (1) such as resolution, area % reproducibility, number of theoretical plates, and tailing factor.

Establish the detection limit
The detection limit of a method is the lowest analyte concentration that produces a response detectable above the noise level of the system, typically, three times the noise level. The detection limit needs to be determined only for impurity methods in which chromatographic peaks near the detection limit will be observed. The detection limit should be estimated early in the method development-validation process and should be repeated using the specific wording of the final procedure if any changes have been made. It is important to test the method detection limit on different instruments, such as those used in the different laboratories to which the method will be transferred. An example of a detection limit criteria is that, at the 0.05% level, an impurity will have S/N G 3.

Establish the quantitation limit
The quantitation limit is the lowest level of analyte that can be accurately and precisely measured. This limit is required only for impurity methods and is determined by reducing the analyte concentration until a level is reached where the precision of the method is unacceptable. If not determined experimentally, the quantitation limit is often calculated as the analyte concentration that gives S/N = 10. An example of quantitation limit criteria is that the limit will be defined as the lowest concentration level for which an RSD „ 20% is obtained when an intra-assay precision study is performed.

Establish stability
During the earlier validation studies, the method developer gained some information on the stability of reagents, mobile phases, standards, and sample solutions. For routine testing in which many samples are prepared and analyzed each day, it is often essential that solutions be stable enough to allow for delays such as instrument breakdowns or overnight analyses using autosamplers. At this point, the limits of stability should be tested. Samples and standards should be tested over at least a 48-h period, and quantitation of components should be determined by comparison to freshly prepared standards. If the solutions are not stable over 48 h, storage conditions or additives should be identified that can improve stability.

An example of stability criteria for assay methods is that sample and standard solutions and the mobile phase will be stable for 48 h under defined storage conditions. Acceptable stability is „ 2% change in standard or sample response, relative to freshly prepared standards. The mobile phase is considered to have acceptable stability if aged mobile phase produces equivalent chromatography (capacity factors, resolution, or tailing factor) and assay results are within 2% of the value obtained with fresh mobile phase.

For impurity methods, the sample and standard solutions and mobile phase will be stable for 48 h under defined storage conditions. Acceptable stability is „ 20% change in standard or sample response at the limit of quantitation, relative to freshly prepared standards. The mobile phase is considered to have acceptable stability if aged mobile phase produces equivalent chromatography and if impurity results at the limit of quantitation are within 20% of the values obtained with fresh mobile phase.

Establish precision, Round 2
The remaining precision studies comprise much of what historically has been called ruggedness. Intermediate precision (2) is the precision obtained when the assay is performed by multiple analysts, using multiple instruments, on multiple days, in one laboratory. Different sources of reagents and multiple lots of columns should also be included in this study. Intermediate precision results are used to identify which of the above factors contribute significant variability to the final result.

The last type of precision study is reproducibility (2), which is determined by testing homogeneous samples in multiple laboratories, often as part of interlaboratory crossover studies. The evaluation of reproducibility results often focuses more on measuring bias in results than on determining differences in precision alone. Statistical equivalence is often used as a measure of acceptable interlaboratory results. An alternative, more practical approach is the use of "analytical equivalence" in which a range of acceptable results is chosen prior to the study and used to judge the acceptability of the results obtained from the different laboratories.

An example of reproducibility criteria for an assay method could be that the assay results obtained in multiple laboratories will be statistically equivalent or the mean results will be within 2% of the value obtained by the primary testing lab. For an impurity method, results obtained in multiple laboratories will be statistically equivalent or the mean results will be within 10% (relative) of the value obtained by the primary testing lab for impurities „ 1wt%, within 25% for impurities from 0.1-1.0 wt%, and within 50% for impurities \h0.1wt%.

Is it robust?
The robustness of a method is its ability to remain unaffected by small changes in parameters such as percent organic content and pH of the mobile phase, buffer concentration, temperature, and injection volume. These method parameters may be evaluated one factor at a time or simultaneously as part of a factorial experiment (13). Obtaining data on the effects of these parameters may allow a range of acceptable values to be included in the final method procedure. For example, if column performance changes over time, adjusting the mobile-phase strength to compensate for changes in the column may be allowed if such data are included in the validation.

An example of robustness criteria is that the effects of the following changes in chromatographic conditions will be determined: methanol content in mobile phase adjusted by + 2%, mobile-phase pH adjusted by + 0.1 pH units, and column temperature adjusted by + 5 |AoC. If these changes are within the limits that produce acceptable chromatography, they will be incorporated in the method procedure.

Doing it right the first time
Performing a thorough method validation can be a tedious process, but the quality of data generated with the method is directly linked to the quality of this process. Time constraints often do not allow for sufficient method validations. Many researchers have experienced the consequences of invalid methods and realized that the amount of time and resources required to solve problems discovered later exceeds what would have been expended initially if the validation studies had been performed properly. We hope that we have provided a guide to help you wend your way efficiently through the method validation maze and eliminate many of the problems common to inadequately validated analytical methods.

I wish to thank Bruce Burgess, Joseph Glajch, and the DuPont Merck radiopharmaceuticals methods quality team for their contributions in formulating many of the concepts presented in this paper.