Design of Experiments in Advanced Manufacturing: A Practitioner's Guide

Design of experiments (DOE) is the structured way to learn how multiple process variables interact to produce a measurable output, using the smallest number of experimental runs that can support a defensible model. It replaces guesswork and one-factor-at-a-time testing with a planned matrix of runs, a fitted statistical model, and a clear answer about which factors matter and in what combination. In advanced manufacturing, DOE is the difference between a process you understand and one you tune by feel.

What is design of experiments and why does it matter for advanced manufacturing?

DOE is a methodology, not a single technique. It covers the planning of an experiment (factor selection, factor ranges, response variables), the construction of a run matrix that gives you the most information for the fewest runs, the execution of those runs in randomized order, and the statistical analysis that turns the resulting data into a model you can use for decision-making.

The standard reference texts are Box, Hunter and Hunter^[1] and Montgomery^[2]. Both were written for chemical and mechanical engineers, and both predate the era of machine learning. The math has not changed. The problem has, because the processes that need optimization today are more coupled than the ones DOE was originally developed for.

Consider a battery electrode coating line. The slurry has rheology that depends on solvent ratio, dispersant loading, mixing energy, and temperature. The coating has weight uniformity that depends on slot-die gap, line speed, and slurry viscosity. The dried electrode has porosity that depends on coating thickness, drying temperature profile, and substrate tension. Calendering changes that porosity again. A change to slurry viscosity propagates through every downstream step.

In a process this coupled, OFAT (one-factor-at-a-time) experimentation produces results that are locally correct and globally wrong. You will find a slurry viscosity that works at the current calendering setpoint, but you will not see that the optimum shifts when you adjust calendering pressure to fix a porosity problem. Roman-Ramirez and Marco^[3] reviewed the published DOE literature applied to lithium-ion batteries and found that, while applications are growing, the methodology remains underused relative to the number of process variables involved.

Why most engineers still don't use DOE

The data on this is direct. Tanco, Viles, Ilzarbe, and Alvarez^[4] surveyed manufacturers in the Basque Country and found that 94% of companies undertake experimentation, but only 20% use DOE. Just 3% use it frequently. The remaining 80% rely on OFAT or unstructured "best guess" experimentation. The same authors followed up with a survey of three European regions and found similar patterns^[5].

The barriers were not what you might expect. The two leading reasons cited by survey respondents were "theoretical ignorance of DOE" (43%) and "absence of a clear methodology" (37%)^[5]. Cost and time came up much lower. The bottleneck is cognitive, not financial.

This matches what you see on the floor. A process engineer trained in chemical or mechanical engineering knows what factors matter in their process. They know which response variables they care about. What they do not know is which DOE design to choose, how to handle a categorical factor mixed with continuous factors, what to do when a run goes bad, or how to interpret an interaction plot once they have the data.

The result is that DOE gets pushed onto a Six Sigma black belt or a corporate statistics group, and the process engineer who actually understands the chemistry is one step removed from the design. By the time the analysis comes back, the process has drifted, the engineer has moved to the next problem, and the model collects dust. Tanco and colleagues found that 76% of respondents believe a simpler methodology is needed^[5]. At Niobia AI, we treat that 76% as the design specification.

The five stages of a DOE that actually works

Every well-run DOE follows the same five stages. Skipping any of them is the single most common reason DOEs fail in industry.

1. Define the problem and the response. What measurable output do you care about? Coating weight uniformity in g/m². Cell capacity at 1C. Wafer-to-wafer thickness sigma. The response must be measurable on every run and must reflect the actual quality outcome you care about. Surrogate responses are a common failure mode.

2. Identify factors and ranges. What process variables do you suspect matter, and over what ranges can you actually run them? Process engineering judgment dominates here. Statistical methodology cannot tell you which factors matter. It can only tell you which of the factors you have nominated turn out to be active. Get the candidate list wrong and the experiment is wasted.

3. Choose the design. Screening, characterization, or optimization. Two-level or three-level. Factorial, fractional factorial, definitive screening, central composite, Box-Behnken, mixture, or D-optimal custom. The choice depends on factor count, run budget, suspected interactions, and whether you need quadratic terms.

4. Run the experiment. Randomize the run order. Block against known noise sources like shift, material lot, and ambient conditions. Replicate selected runs to estimate pure error. This is where production realities collide with statistical theory and where most engineers benefit from a second pair of eyes.

5. Fit the model and decide. ANOVA, regression coefficients, residual diagnostics, profilers, contour plots. The output is a model that predicts the response across the design space and a recommended setpoint. The decision is what to do with that model.

Choosing the right design: a quick decision tree

The biggest single decision in any DOE is design choice, and it is the one where most engineers freeze. Here is the practitioner's version of the choice.

If you have many factors (six or more) and you suspect most are inactive, use a screening design. A fractional factorial at Resolution III or IV will tell you which main effects matter for as few as eight to sixteen runs. The cost is that two-factor interactions are confounded with each other.

If you have many factors and you suspect quadratic effects matter, use a definitive screening design. Jones and Nachtsheim^[6] introduced these in 2011, and they have become the dominant choice for early-stage characterization in process development. A DSD requires 2k+1 runs for k factors, separates main effects from second-order effects, and can detect curvature without a separate follow-up experiment.

If you have a small number of factors (two to five) and you need an accurate response surface, use a central composite or Box-Behnken design. These give you the quadratic terms you need to find an optimum and to estimate process robustness near that optimum. Run counts go from 13 (three factors) to about 30 (five factors).

If your factors are constrained or your design space is irregular, use a custom D-optimal or I-optimal design. These are computer-generated designs that fit the constraints of your problem. They are nearly always the right answer for industrial DOEs because real factor ranges are rarely independent.

JMP's Custom Designer was the first widely available implementation of optimal designs, and the JMP documentation^[7] remains the practical reference for engineers running them. The math is well understood. The bottleneck is choosing the right design from the menu, which is exactly the decision the agent in Niobia AI automates.

From data to model: where DOE actually pays off

Running the experiment is half the work. The model is what you actually use.

Once the runs are complete, the analysis fits a regression model to the data. ANOVA tells you which factors and interactions are statistically significant. Residual plots tell you whether the model is structurally correct. The prediction profiler, originally a JMP construct and now standard in modern DOE software, lets you slide each factor and watch the response move in real time.

The output of a good analysis is three things. First, a ranked list of factors by effect size, telling you what actually drives the response. Second, a fitted equation that lets you predict the response at any setpoint inside the design space. Third, an optimum setpoint with a confidence interval around it.

For coupled processes like battery electrode manufacturing, that model is what closes the loop between process control and quality. Hidalgo, Apachitei, Dogaru, Faraji-Niri, Lain, Copley, and Marco^[8] ran a 3-3-2 full factorial DOE on NMC622 calendering, varying roll temperature (85, 120, 145 °C), post-calendered porosity (30, 35, 40%), and mass loading (120 and 180 g/m²) across 18 unique experiments with 54 half-cells for replication. The fitted multiple linear regression model identified the factor combinations that maximized electrochemical performance and quantified the interaction between mass loading and porosity that an OFAT approach would have missed entirely.

Where most teams get DOE wrong

The published literature treats DOE as a statistical problem. On the floor, the failures are not statistical. They are structural.

The most common failure is poor factor selection. A team runs a beautifully designed experiment on the wrong factors. The design is balanced, the analysis is clean, the results are uninformative. The factor that mattered was never in the design. Process engineering judgment must drive the candidate list, and that judgment is usually scattered across three or four people who never sit in the same room.

The second failure is contamination by process drift. A DOE assumes that the only variation in the response is what comes from the deliberate manipulation of factors. But material lots change. Operators change. Ambient temperature changes. If you run an unblocked DOE across two shifts, the shift effect aliases with whatever factor you happened to vary on day two. The data looks clean and the conclusion is wrong.

The third failure is interpretation overreach. The model is valid inside the design space and only marginally valid outside it. Engineers extrapolate. The optimum predicted by the model lies outside the runs that were actually executed, and the predicted optimum behavior does not appear when the process is run there. The fix is confirmation runs. They are almost always skipped.

Niobia AI sees these failures across customer sites because the platform ingests both the experimental data and the surrounding production telemetry. The agent flags when a candidate factor list is missing a known driver of the response, when a run sequence is at risk of drift contamination, and when a recommended setpoint is too far from the design space to be trusted without a confirmation run.

Surveys put OFAT adoption at 80% and formal DOE adoption at 20%, with only 3% of manufacturers using it frequently. The barrier is methodological complexity, not cost: 76% of engineers asked say they need a simpler way to run DOE.

What changes when an AI agent runs the DOE

The premise of Niobia AI for DOE is that the engineer should specify the problem, not the design. The engineer states the response, names the candidate factors and their feasible ranges, and gives the run budget. The agent does the rest. It selects the design class, generates the run matrix, builds the run sheet for the operator, blocks against known noise sources, fits the model when the runs come back, generates the profiler, writes the structured interpretation, and recommends the setpoint with an explicit confidence statement.

This collapses the time-to-model from weeks to hours for the cases where the engineer would have called a corporate statistics group. More importantly, it makes DOE viable for the cases where the engineer would have defaulted to OFAT. JMP-using teams at Johnson Matthey have publicly reported time and resource savings of 50% to 70% from DOE versus OFAT on internal projects^[7]. An agent that removes the design-choice barrier extends that benefit to the engineering teams that never adopted DOE in the first place.

The goal is not to replace the process engineer. The goal is to put the methodology that the textbooks have prescribed for fifty years into the hands of the people who actually run the process.

Summary

Design of experiments is the structured method for finding which factors and interactions drive a measurable output, using the fewest possible runs. It is underused in industry: surveys put OFAT adoption at 80% and formal DOE adoption at 20%, with only 3% of manufacturers using it frequently^[4]. The barrier is methodological complexity, not cost. Niobia AI automates the steps that engineers find hardest, including design selection, run sheet construction, model fitting, and interpretation, reducing time-to-model from weeks to hours and making DOE viable for the production engineers who currently default to OFAT.

About the author

Dr. Gaurav Jha is the Founder of Niobia AI, which builds AI-powered defect detection and process intelligence platforms for battery gigafactories. His PhD focused on fast-charging niobium pentoxide (Nb₂O₅) based nanostructured anodes, with broader research spanning gas sensors, ion sensors, and energy storage materials. At Intel, he worked on wet etch defect reduction in 5nm and 7nm chip fabrication, developing a hands-on instinct for process root cause analysis at scale that translates directly to electrode manufacturing.

He returned to batteries to develop one of the first large-scale lithium-sulfur cathode coatings at Lyten, then moved to Sila Nanotechnology where he worked on silicon anode particles for high energy density and fast-charging applications across consumer electronics and automotive programs. Across these roles, Dr. Jha led manufacturing scaleup from lab to high-volume production, conducted industrial root cause investigations, commercialized key materials products, and developed new electrode chemistries from first principles. He founded Niobia AI to bring that depth of manufacturing and materials science experience into an AI platform built specifically for the production floor.

References

Box, G.E.P., Hunter, J.S., & Hunter, W.G. (2005). Statistics for Experimenters: Design, Innovation, and Discovery (2nd ed.). Wiley Series in Probability and Statistics. Wiley. ISBN: 978-0471718130.
Montgomery, D.C. (2017). Design and Analysis of Experiments (9th ed.). Wiley. ISBN: 978-1119113478.
Román-Ramírez, L.A., & Marco, J. (2022). Design of experiments applied to lithium-ion batteries: A literature review. Applied Energy, 320, 119305. https://doi.org/10.1016/j.apenergy.2022.119305
Tanco, M., Viles, E., Ilzarbe, L., & Alvarez, M.J. (2008). Is design of experiments really used? A survey of Basque industries. Journal of Engineering Design, 19(5), 447-460. https://doi.org/10.1080/09544820701749124
Tanco, M., Viles, E., Ilzarbe, L., & Alvarez, M.J. (2009). Implementation of Design of Experiments projects in industry. Applied Stochastic Models in Business and Industry, 25(4), 478-505. https://doi.org/10.1002/asmb.779
Jones, B., & Nachtsheim, C.J. (2011). A class of three-level designs for definitive screening in the presence of second-order effects. Journal of Quality Technology, 43(1), 1-15. https://doi.org/10.1080/00224065.2011.11917841
JMP Statistical Discovery LLC. (2024). Design of Experiments Capabilities Documentation. SAS Institute. jmp.com/en/software/capabilities/design-of-experiments
Hidalgo, M.F.V., Apachitei, G., Dogaru, D., Faraji-Niri, M., Lain, M., Copley, M., & Marco, J. (2023). Design of experiments for optimizing the calendering process in Li-ion battery manufacturing. Journal of Power Sources, 573, 233091. https://doi.org/10.1016/j.jpowsour.2023.233091