Case study · Internal research · 2026
A live London house-price valuation tool, built on public data and a stacked machine-learning ensemble.
An interactive single-page tool that scores every current London for-sale listing against a stacked machine-learning ensemble trained on 2.18 million Land Registry transactions, with sold-nearby comparables, an 80 per cent prediction interval, and an on-page filterable map. The methodology, the data sources, and the evaluation are documented in detail in the embedded tool below.
£38,832
gap MAE (−32 per cent vs baseline)
68.3 %
within ±10 per cent of asking
2.18M
training transactions (1995–2026)
5,300
live London listings scored
01 · The question
Whether public data alone is sufficient to identify mispriced London property listings.
The English property market exposes an unusually rich corpus of public information: HM Land Registry's Price Paid Data records every residential sale since 1995; the Ministry of Housing, Communities and Local Government publishes the Energy Performance Certificate dump for substantially every dwelling in England and Wales; and the Office for National Statistics maintains an active postcode directory with median household income, deprivation indices, and distances to public-transport access nodes. Whether this evidence base is, on its own, sufficient to identify mispriced listings on the live market is an empirically tractable question. This case study answers it.
The point of the exercise was not to produce yet another house-price model, of which there are many, but to discipline the modelling decisions against the standards used in the assessment community (the IAAO ratio statistics) and to expose the result through an interface that an end-user could actually interrogate, rather than through a static report.
02 · What was built
A stacked gradient-boosted ensemble with a quantile-loss meta-learner, exposed through a single-page interactive tool.
The training table is a 2.18 million-row London subset of Price Paid Data, joined to Energy Performance Certificates by address and to the ONS postcode lookup by full postcode. The modelling proceeded across seven sprints, each tested against an independent eval pool of approximately five thousand live Rightmove listings. Eight base models were trained, each with a different bias-variance trade-off (a quantile-loss median regressor, a per-property-type Optuna-tuned XGBoost, a description-embedding residual learner, and so on). Their out-of-fold predictions were then combined under a quantile-regression meta-learner at the median.
The tool below ranks every current Rightmove listing by the difference between its asking price and the model's prediction, with filters across property type, bedrooms, asking band, station distance, and date added to market. Each listing carries an 80 per cent prediction interval and a sold-nearby comparables table drawn from the same Land Registry corpus, HPI-deflated to the latest London index. A list view and an interactive map view share the same filter state.
Engineering notes
A selection of the decisions worth naming.
Median loss rather than squared loss
The single change that had the largest effect on the within‑ten‑per‑cent metric was the move from squared-error to quantile-error at q = 0.5 (equivalently, the absolute-error / least-absolute-deviations objective) for both the strongest base model and the meta-learner over the eight bases. Property-price residuals are heavy-tailed; squared error pulls predictions toward outliers in a way that hurts the proportion of accurate predictions even where it improves the mean error. The trade-off was made explicitly and documented in the experiments register.
Honest evaluation, IAAO-conformant
The model is reported against the International Association of Assessing Officers ratio statistics (the Coefficient of Dispersion, the Price-Related Differential, and the Price-Related Bias) in addition to the more familiar regression metrics. The Coefficient of Dispersion of 8.6 is comfortably inside the IAAO <15 "excellent" band; the Price-Related Differential is approximately 1.0, indicating no vertical inequity across the price distribution.
Sold-nearby as a model-free check
Each listing is paired with the fifteen nearest sold transactions of the same property type from Land Registry, with prices re-expressed in current-month terms via the ONS House Price Index. The median of these comparables is shown alongside the model prediction in the interface, so a user can see immediately whether the model's view diverges from recent local sales. Where it does, the disagreement is itself the signal of interest.
Calibrated prediction intervals
The interface carries an 80 per cent prediction interval (10th and 90th conditional quantiles) per listing, produced by a quantile XGBoost trained alongside the point estimator. A confidence score derived from the band width is surfaced as a sortable column. A listing whose asking price falls outside the band is a stronger mispricing signal than one whose pct_diff is the same magnitude but whose asking sits inside the interval.
03 · The tool
Filter and rank live London listings against the model.
The interface below loads the full eval pool of approximately five thousand London listings. The default view shows two-or-three-bedroom Terraced, Semi-detached, and Detached properties under £450,000 within five hundred metres of a station, added in the last fortnight, where the asking price falls inside the model's 80 per cent prediction interval and is below the model's central estimate. Every filter, sort, and view is interactive. A separate map view is available via the toggle at the top right of the embedded interface.
04 · Outcomes
An end-to-end research pipeline, an empirically-defensible model, and a live interface.
The headline result is a thirty-two per cent reduction in mean absolute error against the v1 baseline (from £57,299 to £38,832) and a within-ten-per cent proportion of 68.3 per cent, against an industry benchmark of approximately seventy per cent for published consumer automated-valuation models. The model passes the IAAO uniformity and vertical-equity standards.
- Eight base models combined under a quantile-regression meta-learner.
- Five thousand live Rightmove listings scored, with an 80 per cent prediction interval per listing.
- Sold-nearby comparables drawn from Land Registry, HPI-deflated to the latest London index.
- Single-page interactive interface: filterable list and map, multi-key sort, CSV and XLSX export.
05 · A comparable engagement?
A short conversation tends to be more useful than a written brief.
If the question under consideration is whether a domain-specific machine-learning model would benefit from being held to the same standards as a regulated assessment, or whether a public-data corpus is sufficient to support a particular commercial decision, a thirty-minute introduction is usually sufficient to establish whether further engagement would be productive.