Below is a comparison of the phi-1's performance with other models. phi-1 showed high accuracy of 50.6% in HumanEval, a dataset for evaluating programming ability, and 55.5% in MBPP. This result is ...
The team used its technique against a few different performance tests. In the HumanEval test, which consists of 164 Python programming problems the model has never seen, GPT-4 scored a record 67%, but ...
Code Llama 70B can generate and debug larger programming strings than Meta’s previous models. Code Llama 70B can generate and debug larger programming strings than Meta’s previous models. is a ...
Elon Musk’s xAI company is upgrading its Grok AI chatbot. The new model outperformed OpenAI’s AI model on one key HumanEval test. Musk stated in a Friday social media post that Grok 1.5 should be ...
have surpassed OpenAI's GPT-4 in coding capabilities, has announced its new flagship large-scale language model , Phind-405B. Phind-405B scores 92% on HumanEval, matching Claude 3.5 Sonnet. We're ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results