DoE: THE experimental method that every scientist should be aware of

DoE: THE experimental method that every scientist should be aware of

What is Design of Experiments (DoE) and why use it?

Design of Experiments (DoE or sometimes DoX) is an umbrella term that has been coined to cover a particular statistical design and evaluation approach to performing experiments. In essence it is a hugely powerful and very efficient way to conduct experiments where you have more than one variable that you would like to study.
Phil Kay , who describes himself as an Evangelist for Data Analysis says that “Every scientist and engineer in the world should be able to use DoE: It is THE most important method in statistics and data science for industrial R&D and manufacturing”.

Would you rather do 15 experiments or 5?

Perhaps the best way to illustrate this is with a relatively simple example. Say you have a system where you have 2 variable factors, temperature and pH, that you are interested in for a reaction and you want to study their impact on the final yield of your product. All the other factors; time, reagent concentrations, etc are held constant as far as is experimentally possible. You might have some historical (random trial) data for the reaction (Figure 2a) and from that you decide to perform a structured study by varying One Factor At a Time (OFAT). In your 1st OFAT set of trials, you fix reaction pH at one level (pH7.5) and then vary the temperature between 10°C and 100°C in 9 separate experiments. From this set of experiments, you discover that the optimum temperature is 40°C and it gives you a yield of 63% (Figure 1a). For your 2nd set of OFAT trials, you then fix the temperature a 40°C and run 6 experiments between pH 5.0 and 10.0 and find that at pH 6.0 you have improved the yield to 71% (Figure 1b). Hoorah…….!……

DoE blog Fig 1 OFAT data

Figure 1: Yield data from a) 1st OFAT run with pH held at 7.5 and b) 2nd OFAT run with temperature held at 40°C.

….BUT…… In reality it transpires that the optimum yield of 83% is obtained at 80°C and pH9.0. You have run 15 experiments (not including the initial random experiments – Figure 2) and although the yield is reasonable, you have missed the sweet spot, because you didn’t experiment in that region. The red point on all of the charts in Figure 2.

DoE blog Fig 2 OFAT versus DoE Experimental Space Studied

Figure 2: Experimental space studied in a) the initial random trials, b) the OFAT experiments and c) a full factorial DoE with a centre point. The point in red in all charts is the actual process optimum point where a yield of 83% is obtained.

With the DoE approach, doing less, but with more focus gives you more, much more. Kind of the Tai Chi martial art of experimental design. With a DoE trial you vary both factors at the same time, with them set at the corners of the experimental range you wish to study and often include a centre point (Figure 2c). In this example, you only run 5 experiments, but are much more likely to have settings that are near to the optimum condition for the system than with the OFAT approach, because you have covered more of the experimental space. Your second set of DoE experiments can then explore the region in more detail if you wish.

The power comes if you translate this out to a study varying 3, 4, 5 or more factors at once. This is not only possible in a DoE trial, but relatively straightforward to achieve. (Figure 3 shows the design space for 3 factors set at 2 levels with a centre point), If like Doc Brown in Back to the Future you can think forth dimensionally, perhaps you can visualise the experimental ground you can cover with 4 and then even more factors studied at once.

DoE blog Fig 3

Figure 3: Illustration of a factorial DoE design with 3 factors each set at 2 levels, with one centre point (red point) included. 9 experimental runs in total.

How we use DoE at Peak Proteins

Our managing director, Mark Abbott, was one of the early adopters of DoE for recombinant protein production. In this 2002 paper he and his team used a fractional factorial design to optimise the refolding of Cathepsin S from inclusion bodies. In this later study, the aim was to optimise the process to transiently transfect CHO cells. In the early days you needed a card carrying statistician to help both design the experiment and then interpret the data obtained. Fortunately, nowadays, things have moved on and there are software packages that do the heavy statistical lifting for you. This now allows scientists to go a long way with DoE and the statistical interpretation. Mind you, it is still good to have a statistician available to discuss the detail with when you get out of your depth. We use the JMP Software which allows us to design, perform and interpret a range of DoE methodologies such as Full and Fractional Factorial, Definitive Screening, Response Surface and other designs.

At Peak Proteins, we have team members who have significant (>10 years) experience using DoE and other statistical methods for biopharmaceutical manufacturing process control and process development (both upstream fermentation and downstream purification). Currently we are aiming to leverage this knowledge and increase our use of DoE. Where appropriate, we are applying DoE to processes we are trying to optimise. This will be projects such as optimising refolding conditions from inclusion bodies (similar to the Cathepsin S example above), how best to solubilise membrane proteins, optimisation of cell culture protocols, how to obtain the best activity of an enzyme, how to improve the yield from a protein chromatography protocol. As you can see the potential applications are wide and diverse. See our case study in one of the coming month’s newsletter (Mar 2023) for an example where we have used it to optimise an E.coli fermentation process and the subsequent lysis and clarification steps to improve the yield of a target recombinant protein.

The main considerations when starting a DoE trial

There are a few key points to consider when designing a DoE trial and these influence the type of design that you select. To give you an idea, the first thing to decide is how many different factors you are going to study and balance this with how much time and resource you have. This obviously very much depends on the complexity of the experimental set up and protocol. If you have a lot of factors and want to see which are just having an impact on your process, then a screening design is a good place to start. However, often you are not able to get an indication of whether the factors interact and affect each other with these type of designs. If that is important, then factorial designs are needed, however these take more individual runs (more time and resource). Do you need to see if there are quadratic effects (curvature in the response)? Do you need to get data on the inherent experimental variation of the system? If yes, more runs are required. There are solutions to all these considerations, but as you can see time spent thinking through the overall approach and design, is time very well spent.

More information

There are plenty of resources on DoE on the web, especially YouTube. A good place to start though is the free online course provided by JMP “Statistical Thinking for Industrial Problem Solving” that has DoE as one of the modules.

How can I apply DoE to my project?

If you have a protein expression and purification project that you think would benefit from a DoE approach, then please don’t hesitate to get in touch with us. info@peakproteins.com

Go to Top