Humaneval - Search News

Microsoft announces ``phi-1'' that hits HumanEval 50.6% exceeding GPT-3.5 with only 1.3 billion parameters

Below is a comparison of the phi-1's performance with other models. phi-1 showed high accuracy of 50.6% in HumanEval, a dataset for evaluating programming ability, and 55.5% in MBPP. This result is ...

New Atlas

GPT-4 becomes 30% more accurate when asked to critique itself

The team used its technique against a few different performance tests. In the HumanEval test, which consists of 164 Python programming problems the model has never seen, GPT-4 scored a record 67%, but ...

The Verge

Meta’s free Code Llama AI programming tool closes the gap with GPT-4

Code Llama 70B can generate and debug larger programming strings than Meta’s previous models. Code Llama 70B can generate and debug larger programming strings than Meta’s previous models. is a ...

Entrepreneur

Elon Musk’s Newest AI Chatbot Outperformed ChatGPT in One Key Area

Elon Musk’s xAI company is upgrading its Grok AI chatbot. The new model outperformed OpenAI’s AI model on one key HumanEval test. Musk stated in a Friday social media post that Grok 1.5 should be ...

GIGAZINE

AI search engine startup Phind announces flagship model 'Phind-405B'

have surpassed OpenAI's GPT-4 in coding capabilities, has announced its new flagship large-scale language model , Phind-405B. Phind-405B scores 92% on HumanEval, matching Claude 3.5 Sonnet. We're ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results