🔬 My Sunday Project: Latent Space Lab


There is a fundamental problem with how we consume AI advancements.

We see a demo on Twitter. It looks flawless. The reasoning is perfect, the code compiles, and the output is exactly what was asked for. We assume the technology has leaped forward.

But I’m the CTO at JetLearn. I don’t live in "Demo World." I live in production. And every Tuesday at 11 PM, I find myself staring at a terminal, wondering why that same "perfect" tool just choked on a messy, real-world user request.

This creates a paradox. Is the model actually smarter? Or did the demo just get lucky?

During the work week, I don’t have time to answer that. I have roadmaps to hit and meetings to run. So the question just sits there, nagging me.

So, I’ve decided to run an experiment.

I’m calling it the Latent Space Lab.

Every Sunday, I’m taking one of those unanswered questions from the week—a new prompting technique, a specific agent framework, a "state-of-the-art" claim—and I’m going to stress-test it.

  • No Clean Data: I’m feeding it the noisy, broken inputs that actual humans generate.

  • No Vibe Checks: I want to measure the delta. Is it 10% better? Is it 2%? Or is it statistically insignificant?

  • No PR: This isn't a corporate blog. It’s just me, a pot of coffee, and some code.

I want to know what happens when you push these tools until they break.

If you’re curious to see the results—failures included—you’re welcome to join me.

Let’s see what we find.

Keep Reading