
I am a Reader at the University of Edinburgh working on fast and high-quality machine translation with colleagues in the machine translation group. More broadly, I make fast neural networks and language models; see my papers or CV.
I ran the project that launched private local machine translation for Firefox, desktops, and even runs an entire translation system locally on your machine inside a web page.
According to the New York Times, I am a native speaker of C++ "on semipermanent loan from the Internet" and my t-shirt collection is "threadbare."
Students interested in studying with me should apply to our PhD or the Centre for Doctoral Training in Natural Language Processing. Natural language processing or systems background is a plus.
My company Efficient Translation Limited optimizes machine translation systems for production.
Brief CV
Edinburgh: | Reader |
---|---|
Edinburgh: | Lecturer |
Bloomberg: | Senior Research Scientist |
Stanford: | Postdoc |
Edinburgh: | Research Associate |
Carnegie Mellon: | PhD advised by Alon Lavie |
Google: | Software Engineer |
Caltech: | BSc, Mathematics and Computer Science |
People







Funding
Large projects:- Coordinating the Bergamot project to preserve privacy in machine translation by running client-side as a Firefox extension. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825303.
- The ParaCrawl project mines the web, including a petabyte of the Internet Archive, for large parallel corpora in 29 languages. Funded by the EU's Connecting Europe Facility.
- Coordinating the Europat project to mine patents for free parallel corpora.
- Coordinating the User-Focused Marian project, which is adding features like forced translation, runtime domain adaptation, and 8-bit GPU support to the Marian toolkit. Funded by the EU's Connecting Europe Facility.
- The UKRI Centre for Doctoral Training in Natural Language Processing is training 50 PhD students over eight years; I am responsible for industry involvement.
- The Columbia-led SCRIPTS team is building low-resource cross-lingual information retrieval for the IARPA MATERIAL program.