A Comparison of the Structural Techniques used at Sygnature Discovery: X-ray Crystallography, NMR and Cryo-EM
In this article we discuss the three techniques used for determining protein and nucleic acid structures, their individual merits, what each workflows looks like, and how they fit into drug discovery.
Introduction
Understanding protein structures is fundamental for unravelling their functions, interactions, and potential therapeutic applications. There are three main techniques used for protein structure determination: X-ray Crystallography, Nuclear Magnetic Resonance (NMR), and Cryo-Electron Microscopy (Cryo-EM) and we are fortunate to offer these three techniques at Peak Proteins (part of Sygnature Discovery). X-ray crystallography has been the leading technique for the structure determination of biological macromolecules for decades and remains the workhorse for high-throughput structure determination. Cryo-EM usage has exploded in the last 5-10 years, largely due to advances in instrumentation and computing (both hardware and software). NMR is the least commonly used for this purpose, however it offers the advantage to explore dynamics and interactions of macromolecules in solution. All three methods can support structure-based drug-design, we discuss the pros and cons of each technique below.
Why we still need structures and shouldn’t rely on AI generated models
The use of structure prediction methods, particularly AlphaFold, have increased in popularity over the last decade. And while structure prediction is an extremely powerful tool, it is not without its limitations and certainly not a replacement for experimental data. For overall fold or topology, AlphaFold can frequently give accurate predictions. However, questions relating to enzymatic mechanisms and protein-protein/protein-ligand interactions are much harder to predict and can only be realised with confidence using experimental methods.
X-ray Crystallography
Background
X-ray crystallography is the dominant technique for determining three-dimensional protein structures, accounting for 84% of the total structures deposited into the Protein Data Bank (PDB) (as of September 2024). In X-ray crystallography, protein crystals are exposed to high energy X-rays which are scattered upon interacting with electrons. The ordered array of protein molecules in a crystal amplifies these scattered X-rays in a way that a single protein molecule would not. The resulting scattered X-rays appear as a pattern of spots on a detector. Encoded in this pattern is the amplitude information for the diffracted X-rays. This, in addition to the phase information, is used to determine the precise coordinates of the individual atoms within a protein.
In drug discovery, X-ray crystallography can provide atomic resolution information about the interactions between protein-ligand complexes. When developing small-molecule drugs, this information is invaluable for guiding the design of more potent and specific lead compounds. As well as guiding design, X-ray crystallography is routinely used as a method of fragment-screening. Here, a library of small fragments is soaked into a well established crystal system and datasets are investigated to identify binding events. Any fragment that is identified can then be grown and developed with further rounds of synthesis and X-ray crystallography testing. Within Sygnature Discovery we have a fully integrated pipeline to drive a fragment design process. For more information please click here.
X-Ray Crystallography Protein Structure Determination Workflow
Click on image to enlarge
Sample and technical requirements
Sample and buffer conditions:
Samples need to be purified to homogeneity and reasonably stable. Generally, a good starting point for a crystallisation screen is to have 5 mg of protein at around 10 mg/ml or nucleic acid at around 5-10 mg/ml. Stability is important as the sample may have to be incubated (at 20oC or 4oC) with the crystallisation cocktails for days-weeks before supersaturation of the protein solution and nucleation of the crystals. A range of buffers are suitable for crystallisation however phosphate containing buffers are not ideal as phosphate readily crystallises in the presence of divalent cations present in many screens leading to false positives.
There is no size limit, either small or large, for samples to be studied by X-ray crystallography. Structure determination of very large complexes by crystallography can be challenging; as the size and complexity of the target increases so do the difficulties in obtaining well-ordered crystals suitable for analysis.
Instrumentation:
The majority of X-ray diffraction data is collected at 3rd generation synchrotrons. These circular particle accelerators work by accelerating electrons through sequences of magnets, which produce extremely bright light in the X-ray region which can be focused and tuned in specific beamlines.
Crystallisation
In order to obtain diffraction data from a protein it is necessary to crystallise the protein of interest. The principle of crystallisation is to take a high concentration of protein in solution and induce it to come out of solution at a rate that promotes crystal growth and not precipitation. The range of variables for these perfect conditions is wide – precipitant, buffer, pH, protein concentration, temperature, technique, additives. If conditions for initial crystal formation are identified, even if crystals are small or diffract poorly, optimisation can be used to improve size and diffraction quality.
The process of crystallisation presents the largest hurdle in X-ray crystallography and there are no guaranties that a given protein will ever yield crystals. However, there are several methods to make a protein more conducive to crystal formation. These include removal of flexible regions, domains or glycosylation sites to reduce overall flexibility and increase the number of stable crystal contacts.
Integral membrane proteins pose a specific problem for crystallisation, as they are often purified using detergents or presented in nanodiscs in order to mimic the hydrophobic membrane. The membrane mimetic is generally more fluid/less ordered than the protein and so minimises potential crystal contact sites. The method of incorporating integral membrane proteins into a lipid to achieve a more stable environment for the protein crystallisation in lipidic cubic phase (LCP) has been very successful, particularly for GPCRs.
Data collection and processing
Once grown, crystals can be screened at a synchrotron to determine the diffraction limits and optimise a data collection strategy. A complete dataset is then collected where parameters including detector distance, degrees of rotation and radiation dose are set for optimal data quality. The resulting dataset will likely contain 1000’s of individual images each with a pattern of diffraction spots.
The diffraction spots on the images are indexed, intensities measured, and crystal symmetry and space group proposed; data are then scaled and merged resulting in a file (mtz/cif) which contains a complete list of the indexed spots with amplitude information. To determine a 3-dimensional protein structure, this information must be combined with the phase information. As the phase information is not captured during an X-ray crystallography experiment (the ‘phase problem’) it must be estimated using one of a handful of methods. The most common method is molecular replacement, where the data are searched with a structure which is very similar to the target structure. If molecular replacement is not possible, experimental methods can be performed. These involve collecting data from crystals before and after heavy atom soaking (isomorphous replacement) or involve collecting diffraction data at several wavelengths selected to excited specific atoms leading to anomalous diffraction.
Once initial phases are estimated, an electron density map can be calculated, and an atomic model built and refined against the data. The refinement process involves alteration of the modelled structure with the aim to improve the agreement with the observed data while satisfying chemical requirements for bond lengths, angles and atomic interactions.
Our current X-ray crystallography services at Sygnature Discovery
At Sygnature Discovery we have decades of experience solving X-ray crystal structures over a wide range of different protein classes and nucleotide targets. We offer a full X-ray crystallography pipeline including crystal screening and optimisation, data collection, structural determination and analysis.
Our dedicated X-ray crystallography lab is equipped with a STP Biotech Dragonfly and Mosquito liquid handlers, state-of-the-art imaging systems (UVEX from Jansi and RockImager from Formulatrix) and a Crystal Shifter (Oxford LabTech). Once we have crystals, data are collected during our regular visits to 3rd generation synchrotrons (DLS, ESRF, Soleil and DESY).
Our structures are routinely determined using molecular replacement or de-novo phasing by Se-Met or SAD/MAD. We also have the capability to processing large numbers of fragment-screen datasets (such as XCHEM) using our bespoke AWS pipeline utilising PanDDA software.
Nuclear Magnetic Resonance (NMR)
NMR background
Nuclear Magnetic Resonance (NMR) is a non-destructive biophysical technique that allows the investigation of small molecules, peptides, proteins, nucleic acids and carbohydrates in solution with atomistic resolution. NMR focuses on exploring the magnetic properties of atoms within the sample and how they are perturbed by intra- and inter- molecular interactions with the environment. As a result of this, dynamic and structural information of the sample can be extracted. It is important to highlight that only certain type of atoms like 1H, 15N, 13C, 19F and 31P are magnetically active in NMR and their natural occurrence in macromolecules varies 99.9% (1H), 0.37% (15N) and 1.1% (13C), 100% (19F) and 100% (31P). In order to pursue structural studies by NMR, it is necessary to enrich the samples with these isotopes by biotechnology engineering. For the purposes of this article, we are going to focus our description on polypeptides and proteins.
NMR Protein structure determination workflow
Click image to enlarge
Sample and technical requirements
Sample and buffer conditions:
Polypeptides below 5 kDa do not require an isotopic enrichment. This provides a great advantage for all types of targets that are isolated from a natural source. Proteins in the range of 5 – 25 kDa will require of 15N and 13C isotope enrichment with at least 85% of incorporation. High protein concentrations are required. Concentrations above 200 µM in a volume between 250 – 500 µL are recommended. The protein target should have high stability versus time under the chosen conditions. Conformational changes and/or degradation process can compromise the data analysis and expected results. At least (5-8 days as a minimum). Phosphate or Hepes buffers are preferred, pH near or below 7.0, salt concentrations below 200mM.
Instrumentation:
Overall, an NMR spectrometer is a series of coils fully embedded in cryogenic liquid that are able to generate a massive magnetic field, over 564,000 times higher than the earth magnetic field. The created field is necessary to excite and detect the micro-magnetic fields generated by the atoms within the molecule. An NMR spectrometer above 14.1 T (600 MHz in 1H frequency) equipped with cryoprobe will the minimal requirements needed for structure determination.
Sygnature Discovery has access to a wide range of spectrometers (500, 600, 750, 800, 900, 950 and 1000 MHz) at the NMR platforms of Leeds and Birmingham Universities.
Protein labelling with 15N and 13C isotopes
The only method to incorporate 15N and 13C atoms into proteins is by recombinant expression, predominantly by using prokaryotic systems (a wide range of E. coli strains) and a to a lesser extent some eukaryotic systems.
E. coli strains have the versatility to produce uniformly and specifically labelled proteins at very high yield and a low cost. However, there are a couple of disadvantages to expressing proteins in E. coli: a) they can’t produce post-translation modifications for proteins b) Are lacking the machinery to properly fold protein coming from a eukaryotic origin, this can sometimes lead to improper protein folding or in the worse scenario not folding at all (formation of inclusion bodies, IB). Although, there are multiple examples in the literature, showing success in overcoming folding issues references.
Labelling a protein expressed in E. coli requires minimal media enriched with the specific isotope(s) required for incorporation, which can be quite expensive depending on the isotope(s). Before commencing with any expensive labelling, we recommend that a series of expression and purification tests are performed in order to determine suitable conditions to obtain soluble and properly folded protein of interest.
NMR data collection and processing
This task is divided in two sections:
1. The aim of the first block of experiments is to acquire a series of double and triple resonance NMR data from 6-8 methodologies to revel the chemical shifts of amide protons, N, Co, Cα and Cβ atoms.
These data will be used to assign the backbone and side chains of the respective protein.
2. After completing the analysis of data from 1. A second block of triple resonance NMR experiments (using 3-5 methodologies) are acquired to identify the intramolecular 1H –1H network that will be crucial to fold the respective protein model.
Each data set is processed using dedicated NMR software that applies a series of mathematical functions to obtain a symmetric and positive resonances.
Main- and side-chain assignment.
Is important to mention that every atom in the proteins has a specific chemical shift (CS) that is influenced by the local atomic environment, resulting in multiple positions in the NMR spectra even for the same kind of atom. A rational approach is taken to determine the CS and nature of every H, N and C atom of the amino acids within the protein in the 2D and 3D NMR experiments in order to build up a series of inter-atomic connections. Depending on the data quality, this process can be done automatically or fully interactively by the user.
Extracting experimental constrains and protein folding.
The purpose of this phase is collating all possible experimental information of the protein by NMR or any other biophysics technique in terms of a) inter-atomic H-H distances within a range of 5 Å, b) phi and psi angles torsions, c) H bonds, d) HN or CH orientations with respect to the spectrometer magnetic field (Residual Dipolar Coupling, RDC). All of this information is implemented as restraints to properly fold the 3D model of the protein structure in solution.
The protein model folding is initiated by cycles of fast unfolding and slow refolding process by molecular dynamics simulation (MD) in implicit solvent. During the cooling down process, the protein adopts the experimental restrictions provided to the NMR algorithm (there are many available in the literature: XPLOR, ROSSETA, CYANA and UNIO).
Then, the 3D structure model is refined in explicit solvent to eliminate any atomic violation within the protein. Finally, the 3D model of the protein structure is reported as an ensemble of the best 10-20 structures that have the lowest energy after the refining stage.
Our current NMR services at Sygnature Discovery
We are delighted to offer our clients a wide range of NMR services, from protein quality control, ligand observed fragment screening, peptides/proteins assignment up to structure determination of proteins.
Single Particle Electron CryoMicroscopy (Cryo-EM)
Cryo-EM background
Structure determination by cryo-EM is often thought of as a new technique, in fact today’s success story of being a high-resolution technique that rivals X-ray crystallography and NMR is built on decades of development. Since the first electron microscopes were built in the 1940s hardware, data collection and processing strategies, computational (both software and hardware) developments, and sample preparation methodologies have cumulated in the so-called ‘resolution revolution’. On the hardware, the development of field emission guns, effective vacuums, and more recently high-speed electron detectors have led to highly efficient microscopes. Sample development from negative stain samples to samples prepared in vitreous ice have allowed for a reduction in radiation damage and higher resolution data; cross-linking methods have also been developed to help maintain fragile complexes. Data collection methods using low-dose on frozen samples and imaging as a movie in real-time promote high-resolution data. Method development to enable 2D classification through to 3D reconstruction, application of the contrast transfer function, correction for the Ewald sphere curvature and application of machine learning contribute to data processing pipelines. Computational developments allow for fast transfer, and storage, of TB of data and graphical processing units that can be used to efferently process data. The first 3D reconstructions from cryo-EM data were published in the 1980’s and were in the region of 45 Å resolution. The first sub-nm resolution structures were published in 1997; the relatively low-resolution envelopes of macromolecules led to the commonly used ‘blobology’ term for cryo-EM. For over ten years now, resolutions achievable by cryo-EM rival those achievable by X-ray crystallography and cryo-EM can be used routinely for structure-based drug discovery programs.
Cryo-EM Protein structure determination workflow
The first step in determining a structure by cryo-EM, and arguably the most important, is to obtain a pure, homogenous sample. The sample is then rapidly frozen in liquid ethane, supported on a grid, to achieve a thin film of sample in a hydrated state that is free from ice crystals.
The sample is then transferred to a transmission electron microscope under cryogenic conditions while a beam of electrons interacts with the sample and images are formed and magnified on the detector. Oftentimes, thousands of images are collected, the individual protein molecules in each image are then picked, in silico, and are sorted based on their orientation. These images are then pieced together to provide a map where, if the resolution permits, a set of coordinates can be modelled into, to allow for ease of interpretation.
Click on image to enlarge
Sample and technical requirements
There are a number of sample requirements that should be taken into consideration before pursuing structure determination by cryo-EM. First of all, size matters; the larger the protein (macromolecule(s)) the easier it is to see. Generally speaking, macromolecules larger than 150 kDa are more straightforward than macromolecules that are smaller than 150 kDa. However, if the protein of interest is small, size can be augmented by using nanobodies, antibodies, legobodies, scaffolds (Liu et al., 2019). Size is not everything, it must be possible to orient the particle of interest also, so particles that have no/limited distinguishing features are also difficult to study by cryo-EM. For example, an integral membrane protein that is embedded in a detergent micelle or nanodisc that has no extra-membranal features is challenging to align. It is also important that the sample is fairly homogenous and monodispersed within the ice and has minimum components in the buffer. For example, high glycerol and high detergent contents provide a very high background in the images, therefore reducing the contrast and making it more difficult to pick out the particles of interest (low signal to noise ratio). The good news is, there are no prerequisites for crystals or any restriction on pH. In comparison with the other structural techniques, cryo-EM requires low volume and low concentration of sample; 3-4 ul per grid at ~1 mg/ml for protein complexes or up to ~10 mg/ml for some membrane proteins. Usually, 2-10 grids are made initially to screen conditions such as protein concentration and blotting parameters.
In order to view macromolecules under a transmission electron microscope for the purpose of single particle analysis, they first need to be applied to a support. This support, commonly referred to as a grid, is a foil circle that is 3 mm in diameter, usually made from copper or gold with a carbon support. The most common method for applying a sample to a grid is to pipette 3-4 ul onto the grid, remove any excess by blotting/wicking onto filter paper leaving 10-100 nm layer, and plunge freeze the grid quickly into a bath of liquid ethane (Dubochet et al., 1988). Liquid ethane is used for freezing grids to cryogenic temperatures as it has a higher capacity than liquid nitrogen and so freezes faster, vitrifying (amorphous glass like phase) water molecules in the sample and avoiding ice crystal formation. Other, less common, sample application techniques include spraying/spotting pL-nL of sample over a grid surface before plunge freezing; and/or using self-wicking grids eliminating the need to blot off excess sample. A series of grids is usually prepared while the sample is in its freshest/most optimal state.
A thin layer of vitrified ice containing monodispersed macromolecules in random orientations is required for data collection.
Once made, the grids are maintained at cryogenic temperatures for data collection. Before loading into a 300 kV or 200 kV microscope they need to be placed in a small frame (o-ring and c-clip sandwich the grid ‘clipping’) to increase the stability of the grid that allows the sample changer within the microscope to move the grids onto the stage without damaging them.
Once loaded into the microscope the grid is imaged under low dose conditions to view the sample with minimal radiation damage. First a series of images at low magnification are taken and stitched together to give an overall view of the whole grid (atlas). Areas of the grid that appear to have a reasonable thickness of vitrified ice (not too dark, but not empty) are selected to image at higher magnifications. A number of microscope settings need to be optimised while screening areas of the grid to ensure focus, and therefore defocus settings are accurate.
Data collection and processing
Once areas of the grid are identified as having optimal ice thickness and particle distribution, data collection can be set up. Again, a number of microscope settings need to be optimised for data collection including the dose rate, defocus range and exposure time. Due to the developments in fringe-free imaging, exposure rates of 400-600 movies per hour are often achieved allowing for fast data collection. Advancements to multi-grid data collection enables numerous samples to be collected in one microscope session thus increasing efficiency. Micrograph images are collected as a movie series to mitigate small movements during the data collection. The movie images are then aligned, and processing of the data can commence. Particles need to be identified on the micrograph that should be in multiple orientations; particle picking can then be performed either manually, automatically using a template, or machine-learning based algorithms. These particles are then aligned, computationally, to determine if they can align with each other in 2D. A number of groups/classes are specified to enable alignment of a number of orientations and eliminate any rogue/junk particles. The particles that align in 2D are then used to generate an initial model and align in 3D, again using a defined number of classes to determine groups of particles that align well to each other. Finally, a full 3D reconstruction of the particle is then subjected to particle polishing and CTF (contrast transfer function) refinement to get the best out of the data.
Only at the final stage of the processing is the overall resolution of the data set realised and the model coordinates are then able to be built and refined to best fit the map.
Our current Cryo-EM services at Sygnature Discovery
We offer the full gene to structure pipeline, and any part thereof. We have recently invested in a vitrobot IV for grid preparation and a dedicated AKTAmicro system to enable small quantities of sample to be purified with minimal dilution at the final polishing step. We have agreements with a number of facilities to use 300 kV and 200 kV microscopes for grid screening and data acquisition. Our processing pipeline is set up on AWS and so is efficient and scalable for numerous parallel projects.
Advantages | Disadvantages | |
---|---|---|
X-Ray Crystallography |
|
|
NMR |
|
|
Cryo-EM |
|
|
References
Dubochet, J., Adrian, M., Chang, J.-J., Homo, J.-C., Lepault, J., McDowall, A. W., & Schultz, P. (1988). Cryo-electron microscopy of vitrified specimens. Quarterly Reviews of Biophysics, 21(2), 129–228. https://doi.org/10.1017/S0033583500004297
Liu, Y., Huynh, D. T., & Yeates, T. O. (2019). A 3.8 Å resolution cryo-EM structure of a small protein bound to an imaging scaffold. Nature Communications, 10(1), 1864. https://doi.org/10.1038/s41467-019-09836-0