Model Evaluation & Threat Research (METR) is offering a reward for new problems to test the abilities of large language models (LLMs).
New, novel, unusual problems that there won’t be answers for or even hints at out on the Internet at large.
The tasks should be at least moderately hard, likely to stay hard over time, and easy to evaluate the quality of the solution.
on the types of problems they’re looking for and the bounties they are offering.
#metr #llm #chatgp #ModelEvaluationAndThreatResearch