Maintaining and Enhancing Diversity of Sampled Protein Conformations in Robotics-Inspired Methods


The ability to efficiently sample structurally diverse protein conformations allows one to gain a high-level view of a protein’s energy landscape. Algorithms from robot motion planning have been used for conformational sampling and promote diversity by keeping track of ``coverage” in conformational space based on the local sampling density. However, large proteins present special challenges. In particular, larger systems require running many concurrent instances of these algorithms, but these algorithms can quickly become memory intensive because they typically keep previously sampled conformations in memory to maintain coverage estimates. Additionally, many of these algorithms depend on defining useful perturbation strategies for exploring the conformational space, which is a very difficult task for large proteins because such systems are typically more constrained and exhibit complex motions. In this paper, we introduce two methodologies for maintaining and enhancing diversity in robotics-inspired conformational sampling. The first method leverages the use of a low-dimensional projection to define a global coverage grid that maintains coverage across concurrent runs of sampling. The second method is an automatic definition of a perturbation strategy through readily available flexibility information derived from B-factors, secondary structure, and rigidity analysis. Our results show a significant increase in the diversity of the conformations sampled for proteins consisting of up to 500 residues. The methodologies presented in this paper may be vital components for the scalability of robotics-inspired approaches.

Journal of Computational Biology, 25(1)
Jayvee R. Abella
Senior Data Scientist / Machine Learning Engineer