05 Mar 2025 2 min read

How Booking.com Uses AI to Solve Real Problems- Without the Hype

The AI Engineer Summit wrapped up a few weeks ago in New York. If you missed it, don’t worry- I’m here to share key insights on AI agents!

One presentation stood out to me: Booking.com’s collaboration with Sourcegraph to implement generative AI with real ROI. What I love about this journey is that there’s no vague corporate speak or overly ambitious but unrealistic expectations. Instead, Booking.com’s approach focused on clear KPIs, bottom-up implementation, and gradual scaling over time.

The problem

Booking.com faced a growing codebase, which led to increased cycle times. This is the classic tech debt challenge- companies prioritize new features to stay competitive, leaving poor code to be addressed later.

The solution

To tackle this, Booking.com partnered with Sourcegraph and followed three key steps:

They started with code search to improve developer velocity.
Next, they introduced a coding assistant for code generation and Q&A.
Finally, they began developing AI agents to automate various steps in the software development lifecycle.

Small steps, scaled over time

As I mentioned earlier, Booking.com’s AI journey resonated with me because it aligns with my core values: clear KPIs, bottom-up implementation, and scaling over time.

Their path wasn’t mapped out in advance. Instead, they assessed their understanding of AI at each stage and applied it to a problem that matched their level of expertise.

Here’s what happened since January 2024:

January 2024: New to generative AI, the company set a KPI based on hours saved.
July 2024: After monitoring AI usage, they confirmed that developers using AI were more effective. This prompted them to train developers on integrating AI into their coding workflow.
October 2024: With deeper insights, they moved beyond their initial KPIs and set more workflow-specific metrics (more on this in the next section).
November 2024: One key metric they tracked was merge requests. They found that developers using LLMs daily contributed 30% more merge requests.

Specific metrics across different time horizons

As they progressed, Booking.com refined their metrics to align with their workflow. Instead of broad KPIs, they set benchmarks over different time horizons, ensuring relevance while accounting for AI’s rapid evolution.

Here’s how they structured their metrics:

Short-term: Lead time for change (e.g., time to review merge requests, debug, and merge).
Medium-term: Quality improvements (e.g., test coverage, vulnerability reduction) and codebase insights (e.g., identifying dead code or non-performant code).
Long-term: Tech modernization (e.g., reducing monolith dependencies, clearing feature flags).

As we can see, Booking.com didn’t start with a rigid master plan. Instead, they iterated at each stage, understanding AI’s capabilities and applying them meaningfully to their workflow.

If you’re considering AI implementation in your enterprise, don’t let analysis paralysis hold you back. Start small, take a bottom-up approach, and refine your metrics over time.

What are your experiences with implementing AI in your enterprise? Drop a comment below.