Atalaya — A Watchtower over Chile’s Open Data

Published:

A watchtower over Chile’s open data. Atalaya harvests the Data Observatory catalog, profiles every downloadable table, and mines five kinds of cross-dataset relation into an explorable knowledge graph — turning a flat list of a thousand-plus datasets into a map of which ones join, overlap, correlate or share a source. Live at atalaya.fasl-work.com.

Atalaya — harvest → profile → relate → knowledge graph

Five relations, mined with care

Over 1,017 real datasets: same-source, semantic similarity (a MiniLM model exported to ONNX), spatial overlap, joinability (MinHash containment) and statistical correlation (Spearman with a permutation null, Benjamini-Hochberg FDR control and a partial-correlation guard). A novel calibrated affinity score fuses the signals against null-distribution percentiles and can be re-weighted live. It ships as a static React SPA with the graph baked in and semantic search running client-side — no backend.

Honest about the graph

The number that matters is not “~14,000 relationships” — most of those are cheap priors. The hard evidence is a few hundred joinable pairs and a handful of FDR-controlled correlations, and Atalaya labels that strength rather than hiding it. It never implies causation (some small-n correlations are flagged fragile), and the modern embedding model beats the classical TF-IDF baseline only modestly (~+1.4 points). A relation explorer, reported at the confidence the data supports.

Live demo · GitHub repository