Rami Al-Rfou is a Member of Technical Staff at OpenAI, where he is a founding member of the robotics effort and leads work on embodied foundation models.
Previously, Rami was a Senior Staff Research Scientist at Waymo Research, where he led foundational motion modeling work for forecasting and planning. His team developed scalable transformer-based approaches for motion prediction, scaling laws, and efficient distillation methods for deployment.
Before Waymo, Rami was a Staff Research Scientist at Google Research. He led and contributed to multilingual and token-free language modeling work including mT5, ByT5, and prompt tuning, and helped deploy assisted writing systems such as SmartReply and SmartCompose across Google products.
Rami received his PhD in Computer Science from Stony Brook University under the supervision of Prof. Steven Skiena. His research has focused on large-scale representation learning across language, graphs, and embodied systems.
Responsibilities include:
Responsibilities include:
Responsibilities include:

Scaling behavior for motion forecasting and planning models in autonomous driving.

Multi-modality scene tokenization for motion prediction.

Multi-agent motion forecasting as language modeling.

Efficient attention architecture for motion forecasting.

Token-free byte-to-byte language modeling at scale.

We present LAReQA, a challenging new benchmark for language-agnostic answer retrieval from a multilingual candidate pool. Unlike previous cross-lingual tasks, LAReQA tests for “strong” cross-lingual alignment, requiring semantically related cross-language pairs to be closer in representation space than unrelated same-language pairs. This level of alignment is important for the practical task of cross-lingual information retrieval. Building on multilingual BERT (mBERT), we study different strategies for achieving strong alignment. We find that augmenting training data via machine translation is effective, and improves significantly over using mBERT outof-the-box. Interestingly, model performance on zero-shot variants of our task that only target “weak” alignment is not predictive of performance on LAReQA. This finding underscores our claim that language-agnostic retrieval is a substantively new kind of crosslingual evaluation, and suggests that measuring both weak and strong alignment will be important for improving cross-lingual systems going forward. We release our dataset and evaluation code at https://github.com/google-research-datasets/lareqa

Scaling behavior for motion forecasting and planning models in autonomous driving.

Multi-modality scene tokenization for motion prediction.

Distillation methods for scaling motion forecasting models.

A raw sensor benchmark for motion forecasting.

Multi-agent motion forecasting as language modeling.

Efficient attention architecture for motion forecasting.

Distillation for efficient and accurate scene-centric motion forecasting.

Soft prompt transfer for better adaptation of frozen models.

Token-free byte-to-byte language modeling at scale.

Prompt tuning improves substantially with model scale.

Parallel data and pre-training dynamics for massively multilingual LMs.

Prior work on Data-To-Text Generation, the task of converting knowledge graph (KG) triples into natural text, focused on domain-specific benchmark datasets. In this paper, however, we verbalize the entire English Wikidata KG, and discuss the unique challenges associated with a broad, open-domain, large-scale verbalization. We further show that verbalizing a comprehensive, encyclopedic KG like Wikidata can be used to integrate structured and natural language to overcome the incompleteness of both sources. In contrast to the many architectures that have been developed to integrate the structural differences between these two sources, our approach converts the KG into natural text, allowing it to be seamlessly integrated into existing language models. It carries the further advantages of improved factual accuracy and reduced toxicity in the resulting language model. We evaluate this approach by augmenting the retrieval corpus in a retrieval language model and showing significant improvements on the knowledge intensive tasks of open domain QA and the LAMA knowledge probe.
Parameter efficient prompt tuning for efficient models at scale
US Patent 12,524,711 (2026)
Trajectory prediction using efficient attention neural networks
US Patent 12,497,079 (2025)
Adapting foundation models for autonomous driving
US Patent App. 19/209,351 (2025)
Scene tokenization for motion prediction
US Patent App. 18/950,830 (2025)
Behavior prediction using scene-centric representations
US Patent App. 18/913,074 (2025)