The successful candidate will join a highly cross-functional team dedicated to product software, back-end infrastructure, and tooling that supports our machine learning initiatives. Our team collaborates with a wide range of internal teams across the company, as well as developers and external partners. This role is focused on the development and deployment of innovative systems and methods for data science and synthesis, while staying current with the latest advancements in these fields.
Key responsibilities include:
* Developing strategies and algorithms for mining large amounts of data numbering in the billions for the purposes of targeted model training.
* Addressing challenges in automatic evaluation of generative results, identification and classification of failure cases, and strategies for assessing their prevalence and severity.
* Streamlining human-in-the-loop processes for dataset construction, and creating and implementing systems that execute on strategies to account for subjectivity and human error.
* Synthesis of training data as well as synthesis of augmentations to real-world data.
Bachelors, Masters or PhD in Computer Science, Mathematics, Physics, or a related field; or equivalent experience.
Significant professional experience as a Data Scientist, Machine Learning Researcher or Applied Scientist in the areas of computer vision, language processing or data synthesis.
Up-to-date with the state of the art in machine learning and generative technologies.
Strong ability in one or more programming languages, preferably Python.
Knowledge of and proficiency with industry standard tools and methods for model training, statistical analysis and data science.
Makes significant hands-on contributions to algorithms and/or implementation.
Outstanding communication and presentation skills and the ability to explain difficult technical topics to everyone from data scientists to engineers to business partners.