Case study · Internal research · 2026

An auction-listing tool, and the finding that an auction-specific model needs auction-house data.

A planned auction-specialist machine-learning model, intended as a sibling to the existing London Property Finder, was abandoned mid-build when the obvious training corpus (HM Land Registry's price-paid category B) was found to be a heterogeneous bucket of non-standard transactions whose median price is +47 per cent above the standard subset, not the −10 to −20 per cent that an auction-discount model would require. This case study documents the finding, the resulting pivot, and the interim flag-and-discount tool below.

Applied machine learning Data audit Disciplined model abandonment Internal research

01 · The plan that was abandoned

A parallel auction model, trained on PPD category B.

The original brief was to train an auction-specialist model in parallel to the retail valuation model documented in the London Property Finder case study. Auction sales are a known sub-market with a structurally different price distribution; HM Land Registry's price-paid data publishes a category B (“Additional Price Paid”) that intuitively appeared to capture auctions among the wider non-standard transaction set. The plan was to apply the same Sprint 0 to Sprint 7 modelling pipeline to category B, ship a parallel stack, and surface its predictions on the listings tool below.

The data refused to support this plan, and on the strength of the first audit the build was halted before any model was trained.

02 · The finding

Category B is +47 per cent above category A at the median.

On 92,612 London category-B rows that survived the same EPC matching, postcode merge, zone, tenure and quality filters as the retail training table, the price distribution is systematically higher than the standard subset at every percentile—by 47 per cent at the median, 79 per cent at the lower quartile, and 31 per cent at the 95th percentile. The distribution is the opposite shape from what an auction-discount model would require.

The reason is straightforward. Category B is documented by the Land Registry as a catch-all for non-standard transactions: transfers under a power of sale and repossessions sit alongside corporate-to-corporate transfers, buy-to-let portfolio exits, matrimonial settlements between high-net-worth spouses, and additional-price entries above the standard chargeable consideration. In the London subset, the upward pull of the non-distressed entries (premium corporate sales, HNW divorces) overwhelms the downward pull of the genuine auction sales. There is no further granularity in price-paid data that would isolate the auction subset, and so a model trained on category B as a whole would learn the price distribution of corporate and matrimonial transactions, not auction prices. Calling such a model an “auction model” would actively mislead the user.

03 · The pivot

Flag, surface, and apply a literature-derived discount band.

With auction-house results data deferred to a v2 engagement (Allsop, Savills Auctions, EIG and BidX1 publish historical results suitable for this purpose, but acquiring and reconciling them is a separate exercise), the interim approach is to:

Detect auction-eligible listings in the existing eval pool, both from Rightmove's own auction flag and from the listing description text.
Score those listings with the existing retail model (p7_6_combined_stack) and surface a clear AUC badge and a literature-derived discount band of 80–95 per cent of the retail estimate next to the prediction. Auction sales clear at this band by reference to the published academic and trade literature, not by reference to any model trained inside this engagement.
Document the finding so that the next engagement does not repeat the assumption.

04 · The tool

Filter and rank auction-eligible London listings.

The interface below loads the same eval pool as the retail tool but defaults to showing only auction-eligible listings. The model column is the retail valuation; the auction-est column shows the literature-derived 80–95 per cent retail-discount band. The notice at the top of the embedded interface restates the caveat in full.

Open the tool full screen (new tab)

05 · What a v2 would require

Auction-house results data, not Land Registry data.

A defensible auction-specialist model requires sold prices from the auction-house records themselves, not the catch-all secondary classification in the price-paid data. The candidate sources are Allsop (the largest London residential auctioneer, with published catalogues and sold prices); Savills Auctions; Estates Gazette Interactive (a paid subscription with comprehensive UK auction coverage); and BidX1 (the largest online-only auction operator).

With auction-cleared prices in hand, the same Sprint 0–7 pipeline applied to the retail model would produce a defensible auction model, and the flag-and-discount UX above would be replaced with a learned per-listing prediction. The pipeline is architecturally ready; only the data is missing.

06 · A comparable engagement?

An audit of the data assumptions before the modelling begins.

If the question under consideration is whether a particular public data corpus is sufficient to support a particular commercial decision, or whether a model already in production is biased in ways that have not been audited, a thirty-minute introduction is usually sufficient to establish whether further engagement would be productive.

Arrange an introduction Read the retail case study