The universe of possible protein sequences is vast.

There are more 64-amino acid sequences than atoms in the universe. The vast majority of these are non-functional.

Functional proteins are like a single galaxy in the vast universe.

OpenProtein.AI maps the sequence universe to focus only on the tiny subspace of functional proteins.

Find protein sequences that meet your design goals.

OpenProtein.AI navigates the functional landscape of your protein and generates variants to optimize your protein's function.

The OpenProtein.AI web platform offers machine learning-powered protein engineering at your fingertips.

OpenProtein.AI fills in missing homology to generate novel, diverse, and functional proteins de novo.

Traditional protein engineering approaches limit searches to known cladistic branches. These approaches are slow, expensive, and miss novel variants from families distant to original sequences.


OpenProtein.AI learns from natural proteins to generate novel functional protein sequences, covering a much broader range of the cladistic sequence tree around any starting amino acid sequence.


OpenProtein.AI designs focused mutagenesis libraries that reflect functional constraints and epistasis, yielding higher success rates. No protein structures required.

Get more out of your mutagenesis data.

OpenProtein.AI achieves state-of-the-art function prediction accuracy for any protein, for any property, for anybody.

Our iterative design framework integrates with your assays to predict function from sequence. Our model adapts to any design goal you have, and learns from your functional measurements in a continuously self-improving cycle.


Seamless integration with your assay data enables OpenProtein.AI to optimize your functional properties directly. No surrogate properties like stability or binding affinity required.


OpenProtein.AI can optimize multiple properties simultaneously. Consider all of your properties earlier in the engineering process.

Our patented technology is detailed in peer-reviewed research papers.

Bepler, T. and Berger, B., 2021. Learning the protein language: Evolution, structure, and function. Cell systems, 12(6), pp.654-669.

Bepler, T. and Berger, B., 2019. Learning protein sequence embeddings using information from structure. ICLR 2019.

Ram, S. and Bepler, T., 2022. Few Shot Protein Generation. arXiv preprint arXiv:2204.01168.

Learn More

Design Better Variants

Design custom variant libraries with demonstrated >10x improvements over conventional mutagenesis library designs.
Identify key positions to target for single-site or combinatorial mutagenesis.

Design More Diverse Variants

Diverse candidates are critical for downstream success. Whereas similar candidates will succeed or fail together, a diverse set of candidates increases probability of success.

Optimize Multiple Properties in Parallel

We can design for multiple properties simultaneously to ensure variants display optimal activity, expressibility, thermostability, and other properties of interest.

Mine Public & Proprietary Data

Given a functional description of a protein and/or starting protein templates, we mine natural sequence databases for diverse proteins predicted to have similar function.

Reduce Iterations

We learn from your experimental data to improve future designs and reduce the number of iterations required. Learnings from one project can benefit and accelerate other projects.

The OpenProtein.AI Web App

ML-driven protein engineering at your fingertips.

Large scale deep learning models are unlocking new capabilities in protein engineering, but these models are expensive to deploy and require teams to develop. Our democratized platform brings state-of-the-art machine learning tech to the bench, with a web-based application that is accessible to non-computer scientists. No expensive infrastructure, or setup, or specialized skill-sets required.

We offer free access to academic researchers, and commercial options range from platform access to partnerships.

Derive insights from your mutagenesis data to design optimized libraries, accelerate design-build-test iterations, and track your progress all in one easy-to-use web app.


Our solutions. Tailored to your needs.

We work directly with your team and data to understand your protein, assay, and design goals. We specialize our platform to your use case and deliver custom libraries designed to meet your specifications. You build and assay those sequences in your system.

We are protein engineers, machine learning pioneers, and experienced entrepreneurs dedicated to democratizing state-of-the-art technology and improving protein engineering workflows.

Tristan Bepler, PhD

Tristan is a machine learning scientist, group leader of the Simons Machine Learning Center at the New York Structural Biology Center, and CEO and co-founder of OpenProtein.AI. He received his PhD from MIT in the Computational and Systems Biology Program. Before starting OpenProtein.AI, Tristan pioneered large language models for learning protein sequence representations and their application to protein property prediction. He is also passionate about machine learning methods for understanding protein structures in native states and accelerating cryo-EM.

Tim Lu, MD, PhD

Dr. Lu is a serial biotech entrepreneur and faculty member in Electrical Engineering and Computer Science, Biological Engineering at MIT. Dr. Lu has been a co-founder and a Scientific Advisory Board member to a number of biotechnology and biopharmaceutical companies, including Senti Bio, BiomX, Corvium, Eligo Bioscience, Engine Biosciences, Synlogic and Tango Therapeutics.

Drop us a message to learn more about what we offer and how our platform can improve your protein engineering process.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.