Zero shot (no training data) machine learning approaches for protein and peptide optimization.
Ewa Lis
CEO, Koliber Biosciences Inc.
The public release of DALL-E in 2021 and ChatGPT in 2023 has ushered in a new era in Artificial Intelligence enabling humans to work along AI to generate images and text. The use of these tools is fundamentally changing the way humans create by providing a variety of starting points, accelerating the process, and eliminating tedious tasks. The awe-inspiring success of ChatGPT is based on novel model architectures (transformers) as well as an unsupervised training approach with masked tokens that leverages vast unlabeled datasets. Similarly to natural language, transformers can be trained on amino acid sequence data enabling development of models that capture an evolutionary understanding of peptides and proteins.
This presentation demonstrates the progression of the Koliber AI peptide / protein platform towards minimizing dataset sizes required to train machine learning models. A suite of applications was explored including anti-microbial and immune-modulating peptides. The models were trained on a wide variety of peptide datasets including cyclic peptides and peptides with non-canonical amino acids. Examples are shown that demonstrate zero shot / de novo predictions of substitutions that enhance enzyme function via increase in activity and broadening of substrate specificity.