GP challenge: evolving the energy function for protein structure prediction

Decoys

In our experiments we have used a set of candidate protein structures (decoys) generated by the I-TASSER ab inito predictor. For each of the 56 non-homologues small protein chains I-TASSER have generated from 12.5k to 20k decoys.
We used 54 chains (excluding 1ogwA and 1cy5A) and a sample of every 10th decoy along the generation time.

Energy terms

We have implemented 8 chosen I-TASSER energy terms and calculate their value for each decoy:

We left out energy terms using data from the threading process (e.g. distance map or contact order) and the hydrophobic potential as they depend on external feature predictors.

Download energy terms: energy_terms.tar.gz [2.8 MB]

The archive contains 54 files, one for each protein. Each line in the file contains space separated list of energy of terms for a single decoy. The decoys (lines) in the file are sorted in increasing order of original I-TASSER energy.

Line format: T1 T2 T3 T4 T5 T6 T7 T8

Distance to the native

For each decoy we have measured its similarity to the known native structure. As a measure we used the root mean square deviation (RMSD) between 3D coordinates of Calpha atoms of two structures minimised with respect to the rotation.

To each decoy we have assigned a rank based on the increasing order of RMSD, averaging the ranks in case of ties. A tie between decoys was called when RMSD values were the same up to the first two decimal places.

Download distances/ranks: distances.tar.gz [384 kB]

The archive contains 54 files, one for each protein. Each line in the file contains space separated list of 3 values: rank, RMSD, and the original I-TASSER energy. The order of decoys (lines) in the file is the same as for energy terms.

line format: rank RMSD energy

Extras

The high resolution version of plots from our paper together with several extra plots not included there are available in two image galleries listed below: