Programming Languages Performance Benchmark

23hOpinion

AI’s most important benchmark in 2026? Trust

In 2026 (and beyond) the best benchmark for large language models won’t be MMLU or AgentBench or GAIA. It will be trust ...

Z.ai Releases GLM-4.7 Designed for Real-World Development Environments, Cementing Itself as "China's OpenAI"

On December 22, Z.ai released GLM-4.7, the latest iteration of its GLM large language model family. Designed to handle ...

Z.ai Open-Sources GLM-4.7, a New Generation Large Language Model Built for Real Development Workflows

Z.ai released GLM-4.7 ahead of Christmas, marking the latest iteration of its GLM large language model family. As open-source models move beyond chat-based applications and into production ...

MiniMax Unveils M2.1 to Bring Multilingual Programming Gains to Open AI Models

Chinese AI startup’s release is a major update to its open-source model series, aimed at multi-language programming and ...

10d

MiniMax releases M2.1 AI model for multi-language programming versatility

MiniMax M2 was released in late October this year. The company stated that M2.1 demonstrated significant improvements in ...

HotHardware

AMD Ryzen 7 9850X3D Benchmark Leak Shows Mixed Performance

While not yet released at retail or even publicly announced, AMD's Ryzen 7 9850X3D processor is confirmed to exist by way of a support document on AMD's website. It's also been benchmarked, assuming a ...

EurekAlert!

MathEval: a comprehensive benchmark for evaluating large language models on mathematical reasoning capabilities

This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...

TechCrunch

Google launches Gemini 3 with new coding app and record benchmark scores

On Tuesday, Google released Gemini 3, its latest and most advanced foundation model, which is now immediately available through the Gemini app and AI search interface. Coming just seven months after ...

InfoQ

Code Arena Launches as a New Benchmark for Real-World AI Coding Performance

A monthly overview of things you need to know as an architect or aspiring architect. Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with ...

VentureBeat

Moonshot's Kimi K2 Thinking emerges as leading open source AI, outperforming GPT-5, Claude Sonnet 4.5 on key benchmarks

Even as concern and skepticism grows over U.S. AI startup OpenAI's buildout strategy and high spending commitments, Chinese open source AI providers are escalating their competition and one has even ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results