Untitled Document

Clustering multiple fits in a cryoEM map

A. Input files:

1. EM density map (xplor or mrc format).
2. A file that contains the list of the pdb files representing different fits for scoring and clustering. For example: “list_pdbs.txt”.
3. A sub-directory that contains all the above pdb files representing different fits (for example: "final").
4. Download the following scripts:

run_score.py

score.py

process.py

cluster.py

B. Edit the following files:

run_score.py (below INPUT PARAMETERS):
1. Set the input parameters of the EM map (MRC or XPLOR format). Make sure that the origin is specified in A units.
2. Specify the path of your work directory.
3. Specify the name of your results directory (eg, results_dir = 'final/').

list_pdbs.txt: the list of the names of your pdb files corresponding to the different “fits” that you want to score and cluster.

C. Scoring:

Score the different fits using MODELLER/Mod-EM (CCF, stereo-chemical and non-bonded interactions terms) by running the following command:

mod9v9 run_score.py > runs_score.log

D. Processing:

Prepare the file used for clustering ("list_pdb_score"):

python process.py

output files:
- score_sum.txt - a complete summary of the scores
- list_pdb_score - a reduced list of the scores

E. Clustering:

Cluster the fits based on Cα RMSD (starting from the best scoring model) using the following command:

mod9v9 cluster.py list_pdb_score cutoff_rmsd score_column

Parameters to set:
- cutoff_rmsd - the Cα RMSD cutoff based on which you want to cluster the solutions. For example ‘3.5’ (for 3.5 A).
- score_column - th ecolumn in "list_pdb_score" based on which we order the clustering (use ‘2’ for total energy or ‘6’ for CCF only). For example:

mod9v9 cluster.py list_pdb_score 3.5 2

output file:
- classes.txt - the file is self explanatory (the lrms column is the Cα RMSD of each fit from the first fit in its class).