Member-only story
The Best Performing public facing LLM is free now
My Deep Dive into Claude 3.5 Sonnet
A few days ago, I embarked on a thrilling experiment. As a software engineer and AI enthusiast, I was eager to test the latest AI model, Claude 3.5 Sonnet, released by Anthropic. I’d heard whispers of its capabilities, how it supposedly outperformed even its larger sibling, Claude 3 Opus. Could it really be true? I had to find out.
The Power of Claude 3.5 Sonnet
What I discovered was nothing short of astonishing. Claude 3.5 Sonnet, despite its smaller size, consistently outperformed not only Claude 3 Opus but also other models in the market, including GPT-40 and Llama 400b. It aced various benchmarks, from MMLU to zero-shot chain of thought, only stumbling slightly in the math benchmark.
Putting the Model to the Test
I started with a simple task: writing a Python script to output numbers 1 to 100. Claude 3.5 Sonnet passed with flying colors, producing the correct script effortlessly. Next, I threw it a curveball: write the game Snake in Python. Again, the model delivered, producing a working game that even had quirky features like wall-phasing and an on-screen score. I was genuinely impressed by its ability to grasp and execute complex instructions.