Using agents to optimize liquid handling transfers

At onepot, we run a lot of chemistry! Our platform ingests compound requests from customers' orders, internal screens, and AI-agent-designed experiments, then executes them on automated liquid handlers. Every run involves dispensing reagents from source racks into synthesis plates, hundreds of aspirate-dispense cycles per batch, thousands per day. Each cycle means picking up tips, aspirating from source tubes, traversing to destination wells, dispensing column by column, and discarding used tips. The fewer cycles we need, the faster the batch is completed and the less consumable waste we produce.

The sequence sounds simple: given a list of reactions, assign each one to a well on a plate, then schedule the aspirate-dispense cycles to execute them. In practice, the constraints make it surprisingly hard. The liquid handler has 8 independent channels on a single Y-axis gantry. Each cycle can aspirate from up to 8 source wells on the same rack. Tip volumes are limited: 50 µL in 384-well mode, so high-volume sources require multiple “chunks” across separate cycles. Sources from different racks can never share a cycle. And all of this has to work across five 384-well plates with 1,920 reactions in a single batch.

We had a working implementation. It was decent — it grouped reactions by source, interleaved them for column synchronization, and scheduled cycles with basic batching. But we suspected there was significant room for improvement. The question was how much, and how to find it without trial-and-error on actual hardware.

The autoresearch framework

Inspired by Andrej Karpathy's autoresearch concept, we built an instant-evaluation benchmark for rack transfer optimization. The idea is straightforward: define the problem with real production data, create a simulation that computes the exact cycle count without touching hardware, and let an AI agent iterate autonomously. Each experiment is evaluated in milliseconds. The agent can run dozens of experiments per hour, committing improvements and discarding failures, advancing through the search space far faster than a human could.

As the benchmark, we used four test cases drawn from real production batches: two urea synthesis runs (1,920 reactions each across five 384-well plates) and Suzuki coupling runs. The objective is simple: minimize total aspirate-dispense cycles. Tip consumption was a secondary, but still very important, metric.

Two files are editable: one controls how reactions are assigned to destination wells on plates, the other controls how those reactions are grouped into physical aspirate-dispense cycles. Everything else — the evaluation harness, the test data, the validation logic — is read-only. The agent can modify the algorithm however it wants, as long as every reaction ends up in a unique destination well and all volume constraints are satisfied.

Assignment matters

A fully naive approach — assigning reactions to wells in the order they arrive, with no grouping — scatters source racks across each plate. The robot ends up making many partial cycles: 3 channels here, 5 channels there, rarely filling all 8. For our benchmark, this produces roughly 580 cycles!

Our initial implementation was already much smarter — grouping reactions by source rack, interleaving for column synchronization, and scheduling chunks sequentially. It was achieved in 462 cycles. Good, but far from optimal.

Plate 4 of 5 — each well colored by source rack (real production data)

amine1

amine2

Naive sequential assignment. Both tags are scattered — the liquid handler constantly switches between racks, wasting channels on every cycle.

...0810

...0841

...6874

...8110

...8119

The visualization above shows the core tradeoff. Each reaction requires reagents from two different source racks (one per tag). In the naive layout, both tags are scattered. Sorting by the first amine perfectly groups one tag — but actively disrupts the other, making the second amine even more scattered than naive. The optimized layout balances both: neither tag is perfectly grouped, but both are well-clustered. This balance is what minimizes total cycles across all tags simultaneously.

What the agent discovered

The agent ran autonomously, trying ideas, evaluating instantly, keeping what worked, and discarding what didn't. Thus, three categories of optimization emerged:

Optimal cycle scheduling. The original scheduler processed all first-chunks before second-chunks, creating partial batches at each level boundary. The agent replaced this with a load-balanced bin-packing approach that computes the provably minimum number of cycles per rack. This alone dropped the number of cycles to 403.

Source-aware plate assignment. The agent discovered that sorting experiments by their source rack identifiers before splitting into plates concentrates each rack's reactions on fewer plates, dramatically reducing cross-plate overhead. It tried every permutation of tag orderings — since each reaction has multiple reagent sources from different racks, which tag dominates the sort matters. This brought us down to 330 cycles.

Greedy + hill-climbing assignment. Sort-and-split is a batch assignment — it doesn't consider the marginal impact of each individual reaction. The agent built a greedy assigner that places each experiment on the plate where it causes the least increase in estimated cycles, considering all reagent tags jointly. It then applies hill-climbing: sampling swap candidates between plates and accepting swaps that reduce actual cycle count. Yet another optimization, 317 cycles.

Accurate cost model. The greedy assigner initially estimated chunk counts using a simple formula: ⌈total_volume / usable_volume⌉. But the actual chunking uses sequential first-fit packing, which can produce more chunks when volumes don't divide evenly. The agent replaced the estimate with an exact first-fit simulation matching the real scheduler. This gave the greedy better information about marginal costs, and different test cases turned out to benefit from different cost models — so the final version runs both and picks the one that produces fewer actual cycles. This closed the gap, resulting in 312 cycles.

Optimization waterfall — total aspirate-dispense cycles

Each layer adds a new optimization. Final result: 32% fewer cycles than the initial implementation.

The result

Total cycles across all test cases

From fully naive to AI-optimized: 46% reduction in physical robot operations.

From a naive implementation at 580 cycles to an AI-optimized solution at 312: a 46% reduction in physical robot operations! Compared to our initial production implementation (462 cycles), the improvement is 32%. At roughly 20–60 seconds per cycle on our hardware, 150 fewer cycles translates to significant time savings per batch, plus proportional reductions in tip consumption, making us more environmentally friendly.

The agent explored 20 distinct ideas over the course of a single session. Some were dead ends: FFD bin-packing for volume chunks, frequency-based ordering, cycle reordering for tip packing — all tried, all produced no improvement on this dataset and were correctly discarded. The experiments that worked shared a common theme: they reduced the number of source-well appearances across plates, maximized channel utilization per cycle, and made globally-informed assignment decisions rather than local ones.

The Agent increases lab throughput

Our lab pipeline runs continuously. Compound requests arrive from three sources: customer orders with specific target molecules, internal screening campaigns exploring chemical space, and Phil-designed experiments where our AI chemist proposes the next molecules to make. All of these converge on the same liquid handlers. When a batch of 1,920 reactions takes 312 cycles instead of 462, we complete it faster, free the instrument sooner, and start the next batch earlier. Over weeks and months, the compounding effect is substantial!

The autoresearch approach also scales to new problems. The same framework — instant evaluation, autonomous iteration, keep-or-discard discipline — can be applied to trough dispensing schedules, purification queue ordering, or any workflow where the objective is well-defined and evaluation is cheap. We plan to use it wherever hardware efficiency has a direct impact on throughput.

The optimized algorithms are now running and producing molecules. If you're interested in how we run chemistry at scale, reach out to us at hello@onepot.ai .