How to plot overlaid residue data plot and free energy landscapes

Step 1: Calculate the required quantities using GROMACS.

Root mean squared deviation (RMSD) and radius of gyration are not required for creating overlaid residue plot, but they would be used to create energy landscapes. Ensure that the processed trajectory and structures are provided, in which the periodic boundary have been corrected for.

Root mean squared deviation (RMSD)

The structure provided here will be used as the reference for calculating RMSD of the trajectory.

gmx rms -f trajectory.trr -s structure.gro -o rmsd.xvg

Radius of gyration

gmx gyrate -f trajectory.trr -s structure.gro -o rg.xvg

Root mean squared fluctuation (RMSF)

gmx rmsf -f trajectory.trr -s structure.gro -res -o rmsf.xvg

This would align the whole molecule for RMSF calculation. You may calculate the RMSF of each chain separately in a protein with multiple chains. This would reflect the fluctations only due to backbone motion and elimiate the fluctuation from relative motion of chains.

Solvent accessible surface area (SASA)

To calculate the average SASA per residue pass -or option to gmx sasa.

gmx sasa -f trajectory.trr -s structure.gro -o sasa.xvg -or resarea.xvg

Secondary structure

For the GROMACS command do_dssp to work the DSSP binary must be available on your system.
Download DSSP binaries here

gmx do_dssp -f trajectory.trr -s structure.gro \
            -o dssp \
            -ssdump dssp \
            -sc dssp_count

-ssdump option will output a file dssp.dat containing the secondary structure for the full trajectory as single letter codes. This would be processed by MD DaVis to calculate the secondary structure persistence at each residue.

Code Secondary structure
H α-helix
G 310-helix
I π-helix
B β-bridge
E β strand
T Turn
S Bend
~ Loop

Step2: Create HDF file with all the data

Step 2a: Obtain the sequence of the protein from the PDB file used to start the simulation.

md_davis sequence structure_used_for_simulation.pdb

Step 2b: Provide this sequence in JSON file below, along with a few other properties. Note that for multi-chain proteins the sequence for each chain would be separated by a '/'.

{
    "label": "MD Simulation",
    "short_label": "MD",
    "html": "<i>MD Simulation</i>",
    "short_html": "<i>MD Simulation</i>",
    "protein": "protein name",
    "scientific_name": "some organism",
    "common_name": "common name",
    "sequence": "PUT/YOUR/SEQUENCE/HERE"
}

The most important property here is the sequence, which tells md_davis collect of the number of chains in the molecule and the number of residues in each chain. The short_html will determine the labels for the data in the final plots. This file is named information.json in the next command.

Step 2c: Collect all the output files generated by GROMACS analysis tools into a single HDF file using the following command:

md_davis collect \
    --backbone_rmsd rmsd.xvg --backbone_rg rg.xvg \
    --trajectory trajectory.trr --structure structure.gro
    --rmsf rmsf.xvg 0 500 \
    --ss dssp.dat \
    --sasa resarea.xvg \
    --info information.json \
    output1.h5

If the --trajectory and --structure options are provided. MD DaVis will calculate the backbone dihedral angles for all frames and the circular standard deviation of each dihedral angle.

Note the numbers at the end of the --rmsf options are the start and end time for the RMSF calculation in nanosecond. These will be inserted as attributes in the HDF file and must be provided. In case, the RMSF for each chain was calculated separately, the files may be provided to --rmsf option in the correct order followed by the start and end times.

Additional details are available with -h option for each MD DaVis command, such as

md_davis collect -h

Step 3: Plotting overlaid residue data

Step 3a: Create a pickle file with the residue dataframe using:

md_davis residue dataframe -p name1 output1.h5 data1.p

The optional argument -a annotations.json can be provided to place a mark at certain residue locations. The contents of annotations.json should be of the following form:

{
    "chain 0": {"Active Site": [23, 41], "Substrate Binding Site": [56]},
    "chain 1": {"Nucleotide Binding Regions": [15, 18]}
}

Each type of annotation is rendered with a different mark. Following annotations are available at present:

Step 3b: Plot the residue data pickle file from the previous command using:

md_davis plot residue data1.p data2.p

Step 4: Free energy Landscapes

Create and plot free energy landscapes using common bins and ranges

md_davis landscape rmsd_rg -T 300 --common --select backbone output1.h5 output2.h5 -s landscapes.h5

This command will create an html file with the interactive landscapes. It will not open the file like other plotting commands, so check the working directory for the output html file.

Plot free energy landscape overlaid with trajectory points

One must save the landscape created by the previous command with -s before this one can be used. Since the output generated for single landscape is big, visualization of multiple landscapes becomes impractical. So, it only plots one landscape at a time. Select the desired landscape in landscapes.h5 by providing its index with -i. By default only the first landscape is plotted

md_davis landscape animation landscapes.h5 -i 0 --static -o trajectory.html