Work Mode: Remote
Engagement Type: Independent Contractor
Schedule: Full-Time or Part-Time Contract
Language Requirement: Fluent English
Role :
Partners with leading AI teams to improve the quality, usefulness, and reliability of general-purpose conversational AI systems. These systems are used across a wide range of everyday and professional scenarios, and their effectiveness depends on how clearly, accurately, and helpfully they respond to real user questions.
In engineering-related contexts, conversational AI systems must demonstrate accurate applied reasoning, quantitative precision, and practical problem-solving aligned with real-world systems. This project focuses on evaluating and improving how models reason about and explain engineering concepts across multiple disciplines.
What You’ll Do
• Write and refine prompts to guide model behavior in engineering scenarios
• Evaluate LLM-generated responses to engineering-related queries for technical accuracy, applied reasoning, and completeness
• Conduct fact-checking and verify any technical claims using authoritative public sources and domain knowledge
• Annotate model responses by identifying strengths, areas of improvement, and factual or conceptual inaccuracies
• Assess clarity, structure, and appropriateness of explanations for different audiences
• Ensure model responses align with expected conversational behavior and system guidelines
• Apply consistent evaluation standards by following clear taxonomies, benchmarks, and detailed evaluation guidelines
Who You Are
• You hold a PhD in Engineering or a closely related field
• You have deep expertise in one or more of the following sub-domains:
• Mechanical & Physical Systems Engineering
• Electrical, Electronic & Computer Engineering
• Chemical, Materials & Process Engineering
• Civil, Environmental & Infrastructure Engineering
• You have significant experience using large language models (LLMs) and understand how and why people use them
• You have excellent writing skills and can clearly explain complex engineering concepts
• You have strong attention to detail and consistently notice subtle issues others may overlook
• Experience reviewing or editing technical or academic writing
Nice-to-Have Specialties
• Experience with applied research, industry engineering workflows, or systems design
• Prior experience with RLHF, model evaluation, or data annotation work
• Experience teaching, mentoring, or explaining engineering concepts to non-expert audiences
• Familiarity with evaluation rubrics, benchmarks, or structured review frameworks
What Success Looks Like
• You identify technical inaccuracies, flawed assumptions, or incomplete reasoning in engineering-related model outputs
• Your feedback improves the rigor, clarity, and correctness of AI explanations
• You deliver consistent, reproducible evaluation artifacts that strengthen model performance