Skip to main content
Title image for applied AI research article with text reading 'Fixing the Hard(ware) Problem

Applied AI Research: How InstaDeep Is Solving the Hard Problem

AI company InstaDeep cannot build the products the industry demands without conducting its own research in-house. Yet bridging the gap between a lab breakthrough and a functional product remains the ‘hard problem’ facing the next generation of industrial AI

IT’S ONE THING to have a research team within your organisation. It’s quite another to apply that research to your products.

Thomas Lecat knows this challenge well. He spent five years as lead engineer on InstaDeep’s project with Deutsche Bahn, the German national railway operator. This is one of the most complex real-time optimisation problems in industrial AI, with hundreds of trains, a live network, and incidents that can demand rerouting decisions in seconds. “This type of work we do is very difficult,” states Farhad Safaei, Technical Development Lead at Deutsche Bahn. “To get a small percentage increase in performance, you have to do a lot of work, a lot of implementations, and new techniques.”

The collaboration between InstaDeep and Deutsche Bahn started in 2019. “What I always liked about InstaDeep,” says Safaei, “was the motivation that I saw and the passion for solving problems. It was not just another consultancy project.” It was this approach, inherent to InstaDeep’s culture, that drove Lecat and the team to keep pushing for ideas from research. “Research would come to us with something brilliant,” Lecat recalls, “and then we’d come back to reality and realise we couldn’t actually use it. The benchmark environments they used were really small, really lightweight. They could reset them at any state. Then you come back to the real environment, one that implements the physics and the constraints, and the simulation is not as fast. The freedom disappears.” He pauses. “And that happened repeatedly.”

This is not an InstaDeep problem. It’s an industry problem, and perhaps a defining structural challenge of applied AI research. Research teams optimise for publication, product teams optimise for shipping. The assumptions that make a new algorithm shine on a benchmark can make it unusable in production. The result is that breakthrough ideas can sit in papers, and the product teams quietly find another way.

“Oftentimes in research, it’s hard to find something that really makes an improvement…whereas for these inference strategies, there was, from one day to another, very significant improvements.”

Stefan Schneider, Research Engineer at Deutsche Bahn

InstaDeep was aware of this challenge from the get-go and determined to do things differently. Yet the conversation that gave real momentum to the change didn’t happen in a boardroom. It happened over coffee in a cramped Paris office sometime in 2024, between Lecat and Felix Chalumeau, a research engineer who was then working on population-based optimisation.

It wasn’t their first conversation. Months earlier, the two had found themselves stuck in a Tunisian airport, waiting for a delayed plane. In that suspended, in-between space, they’d started talking – Lecat about the complexity of routing trains across a live network, Chalumeau about a family of methods he’d been researching for two years, which were inference-time strategies for reinforcement learning.

The Paris chat was a follow-up. Chalumeau had since moved into another area of research and was stepping back from this work. But Lecat and the team felt like the performance wasn’t where they wanted it to be for Deutsche Bahn. Stefan Schneider, research engineer at Deutsche Bahn, recalls this period: “I don’t know if it was a hard ceiling, but it felt like a ceiling.” Lecat questioned Chalumeau: ‘What had he learned? What might actually be useful?”

“In the past, the research team had presented things to the applied teams,” Chalumeau explains, “but in a very out-of-the-blue manner, which sometimes didn’t raise much interest, just because it wasn’t presented in a way that filled any request or need from the applied side.” This time, he tried something different. He took two years of research and mapped the entire literature into a single framework: how much time do you have for inference? How much compute?

A few days after he presented this to the Deutsche Bahn team, they tried one of the simpler strategies, stochastic rollout. The method added small amounts of noise to explore multiple candidate solutions simultaneously, then took the best. The result was immediate. “Oftentimes in research, it’s hard to find something that really makes an improvement…whereas for these inference strategies, there was from one day to another, very significant improvements,” says Schneider. “We’re all the time making small improvements, and then sometimes there are these big jumps and big big differences. This wasn’t the only one, but it was an important one.”

Lecat confirms the impact and frames what it meant beyond the metric: “The big aha moment was to suddenly invest more time and resources on the inference side, which is the second part of the pipeline, after training is done. That unlocked a lot of performance gains.” For a team that had spent years focused on training, it opened a new, parallel path to improving performance.

For Deutsche Bahn, the improvement increased the reliability and quality of results. Safaei notes, “Instead of just trying to come up with one solution, the model gives us flexibility to tweak a bit, come up with different solutions and choose from that.” Today, Wissam Bejjani leads InstaDeep’s work with Deutsche Bahn, continuing its development while bringing it closer to real-world deployment.

For Chalumeau, the significance was different. Several earlier attempts to transfer research to applied teams hadn’t come to fruition. “It brought a bit more confidence into doing this research again,” he says. “A confidence that sometimes in the past was maybe slightly decreased by several attempts that failed.” This confidence would matter. When viewed from above, a printed circuit board (PCB) looks remarkably like a map of a railway network; tracks branching, intersecting, and routing between fixed points. The resemblance is not coincidental. Both are the same type of problem: find the optimal path through a near-infinite number of possibilities.


“What I always liked about InstaDeep was the motivation that I saw and the passion for solving problems.”

Farhad Safaei, Technical
Development Lead,
Deutsche Bahn

A PCB packs thousands of copper tracks, many thinner than a human hair, within a footprint as small as a fingernail, or smaller if it’s something like an AirPod. Each track must reach its designated point without crossing another track, overheating, or violating electromagnetic interference rules. Lecat offers the clearest definition of the underlying challenge: “It’s easy to find one solution. Finding the best solution is very hard. You have many, many possibilities.”

In computer science, this is called a combinatorial optimisation problem, the same class of problem as routing trains, sequencing amino acids, or scheduling logistics. The number of possible configurations grows exponentially with complexity, and this is where other methods, like generative AI, fall. Large language models are probabilistic – they are, at their core, systems for predicting plausible sequences of tokens. Engineering is deterministic. A hallucination in a language model is a creative quirk, but on a PCB, it’s scrap metal. As Karim Beguir, InstaDeep’s CEO and co-founder, puts it: “ask Claude, ChatGPT, or any frontier model to design a printed circuit board respecting all design rule checks, and they will not be able to do it.”

InstaDeep’s approach treats the problem differently. It’s reinforcement learning combined with inference-time search. An RL agent learns via millions of simulated attempts. It will earn a reward for a clean connection and get a penalty for a violation. But training alone reaches a ceiling. The breakthrough from the Deutsche Bahn experiment was understanding that what happens after training, at inference time – when the model is given a real customer’s board to solve – matters just as much.

What followed was a period of recalibration at InstaDeep. The shift from separate departments to close collaboration was, as Thomas describes, a choice. And it required the research and applied teams to work together to leverage each other’s knowledge. Like most meaningful organisational changes, it took bold conversations, including one that happened, unexpectedly, during a fire alarm.

Arnu Pretorius leads InstaDeep’s reinforcement learning research team from Cape Town. At that time, the company’s focus was broadening, and he explored whether his team should expand into bio applications. But the team had conviction about where their strengths lay. They had spent years laying foundations in industrial optimisation and fundamental RL research. More than half of the team wanted to double down on what they did best. Pretorius listened. “In hindsight, I so appreciated the resistance,” he says. “It’s not worth losing all that momentum and years of hard work.”

The team stayed on RL. But the question of what to do with it, specifically, how to connect research work to an actual product, remained in flux. Conversations began about partnering with the DeepPCB applied team, but Arnu had a condition: the teams would need to align their codebases. His reasoning was straightforward. When research ideas were passed to the applied team and failed to work, no one could tell whether the idea was flawed or whether something had been lost in translation between the two incompatible systems. A shared codebase was the only way to know.

The DeepPCB team wanted research engineers to join their codebase and integrate the algorithms into the existing production system. Arnu’s position was the opposite: if the collaboration was to succeed, the applied team needed to move closer to the research codebase, where the algorithms had already been proven. “So I told them: we’re not going to do it. I’m not in,” Pretorius recalls.

A day or two later, a fire alarm went off in the London office. As hundreds of people filed onto the pavement, Amine Kerkeni, Head of Applied AI at InstaDeep, found Pretorius in the crowd. Standing outside in the cold, he made the case: Please reconsider.

Those unscheduled moments on the pavement were the turning point. The teams reached a compromise: the applied team would migrate toward the research codebase. Not immediately, but as a direction of travel. “It took almost a year of digging deep into the engine,” Pretorius says. “There were things happening in the PCB simulation that aren’t in the assumptions we made during research. A huge amount of engineering work just had to get things in the right order.” At the time of writing, they are two or three weeks away from the research codebase becoming the standard codebase.

While the codebase alignment was grinding forward, the research team was producing work that gave the whole effort its intellectual foundation. COMPASS is an InstaDeep method for combinatorial optimisation using latent space search. This and the subsequent NeurIPS oral paper on inference-time strategies, Breaking the Performance Ceiling in Reinforcement Learning Requires Inference Strategies, established something the field had largely ignored.

The prevailing focus in reinforcement learning had been on training: more data, more compute, more parameters. What InstaDeep’s research demonstrated across more than 60,000 experiments and 17 complex tasks is that practitioners were missing half the picture. Giving a model even 30 seconds of structured search time at inference produced an average 45 per cent improvement over zero-shot performance. On the hardest tasks, COMPASS pushed performance by more than 80 per cent.

“There was no massive algorithmic novelty in that paper,” Chalumeau says, with the candour of someone genuinely comfortable with that fact. “We just brought a completely different view on those methods that was kind of ignored by a big part of the community. We shed light on a whole aspect of the RL pipeline that was mostly ignored and underestimated by practitioners.”

“Input from research helps product, but input from product helps research. That’s the only way we can outpace everyone else.”

Thomas Lecat, Head of Growth, InstaDeep

The NeurIPS oral acceptance placed the work in the top 0.3 per cent of submissions. This was external validation that the insight was real. But what Chalumeau finds more satisfying is that the Deutsche Bahn experiment gave the research team confidence to pursue inference strategies seriously. That pursuit produced the paper, and the paper is now flowing back into DeepPCB. “It came from the research, went through discussions with the applied teams, those discussions had an impact on the applied teams, but also had an impact on us. It helped us realise the power of the research we had been doing. And so it inspired a new cycle.”

Pretorius uses a simpler image to explain what COMPASS actually does at inference time. An untreated model is like a blindfolded person throwing a ball at a target, who always throws it the same way. COMPASS says: throw the ball a thousand times, and each time add a small adjustment somewhere so it flies differently. One of those thousand throws hits the target. You find it, you hone in. You trade thirty seconds of thinking time for a substantially better solution.

Ask anyone involved what the hardest part of this collaboration has been, and they give variations of the same answer. The teams think on fundamentally different timescales, and that is both the source of the tension and the source of the value. Applied teams, as Pretorius describes it, are under pressure to improve things greedily. The curve needs to go up continuously to satisfy clients. This produces steady, incremental progress. Research teams operate on a different logic entirely, one that requires accepting long periods of apparent inactivity in exchange for the possibility of a step change.

“If you want to propose something really new,” Chalumeau says, “that is going to break the previous performance by a large margin, you need to go in depth for several months, maybe more than a year, without bringing any results. You have to accept this.”

In practice, the friction can be felt on both sides, and Pretorius is disarmingly direct about it. From the applied team’s perspective, the research team has been around for six months and hasn’t shipped anything. From the research side, the incremental solutions can feel too small, or the problems require a different kind of thinking altogether.


“If you want to propose something really new, you need to go in-depth for several months, maybe more than a year, without bringing any results. You have to accept this.”

Felix Chalumeau, Senior Research
Engineer, InstaDeep

Most technology companies resolve this tension by eliminating one side. Every resource goes to client-facing work, and the research team shrinks, or becomes a marketing function. Product teams end up fighting fires full-time. All of this can, in the very worst cases, lead to institutional burnout, not just in individuals, but in the product itself, which stops growing because no one has the space for a paradigm shift.

The willingness to let a team disappear into a problem for 18 months is what differentiates InstaDeep’s approach. The research team, as Chalumeau’s work demonstrates, functions as the organisation’s thinking time. DeepPCB consistently handles boards with 500 to 1,000 wires. The commercial sweet spot begins at around 3,000 wires: the complex boards used in smartphones, servers, and aerospace. That gap is the current frontier.

The bottleneck, Pretorius explains, is speed. Now COMPASS is working in the DeepPCB engine. But the engine is complex and high-fidelity, and running thousands of inference attempts through it is slow. “If you could wave a wand and make everything a thousand times faster, we would be solving industrial problems that are currently considered untouchable.” Speed, in other words, is not just a performance metric: it’s what determines how complex a board the system can tackle.

There is also the question of quality. A board must not merely work; it must look right to an experienced engineer. It must handle crosstalk and thermal management with an elegance that veteran designers recognise instinctively. Even a technically valid board can carry the hallmarks of a system that doesn’t quite understand what it’s doing. That last mile of craft is what the team is still working toward. Pretorius describes the trajectory as a slow exponential curve. For a long time, very little appears to be happening. Then it picks up. He is optimistic about the first half of this year. “I think it’s doable,” he says. “If we really put our heads down and focus.”

The same feedback loop that built DeepPCB is now being applied more broadly. Lecat, from his new position in commercial strategy, is watching it replicate in the bio and agri-tech sectors too. The pattern is the same: business identifies a market need, domain experts define the constraints, research builds the mathematics to solve it, and the results inform the next cycle of research.

“Input from research helps product,” Lecat says, “input from product helps research. That’s the only way we can outpace everyone else. This story has been told since the early days of InstaDeep. But in the past, there was such a gap between the teams that it didn’t get put into practice.” He paused. “What I love seeing now is that this isn’t true anymore.”