I help development banks, donor agencies, and public institutions turn messy administrative, fiscal, territorial, education, and electoral data into reproducible analysis, dashboards, and models they can defend, hand over, and re-run.
Trusted on programs for World Bank · GIZ · EU Delegation · national governments
The last consultant left a spreadsheet nobody can re-run — and the one person who understood it is no longer on the contract.
A headline number is in the report, but when the auditor asks how it was produced, no one can trace it back to the data.
A forecast was read as a promise, the promise didn't hold, and now the method itself is under question.
Each of these is a reproducibility problem before it's a data problem. The practice below is built to make them not happen.
A few of the situations where a senior, independent data hand earns its keep:
Six things, done to the standard the institutions funding the work expect: documented, defensible, and built to be inherited by the next team.
Turning administrative, statistical, and register data into a clean, documented analytical dataset a decision can rest on — with the lineage and assumptions written down for whoever audits it later.
Indicators, baselines, monitoring logic, and evaluation evidence for donor-funded programs — to reporting standards, with data lineage documented for the auditors who come after.
National administrative geographies, census, and accessibility data joined on consistent boundaries — to show where interventions land, who they reach, and what amalgamation or reform would mean on the ground.
Fiscal and budget data across hundreds of local authorities, standardized and benchmarked — so revenue, spending, and capacity can be compared on a like-for-like basis.
Reproducible data pipelines in R, Quarto reports, and Shiny dashboards your team can re-run long after the engagement ends. No orphaned spreadsheets; a full, documented handover.
Statistical and electoral modelling — Monte Carlo simulation, seat allocation, benchmarking — delivered with the uncertainty quantified, never hidden, on pipelines that ingest public data reproducibly.
I use AI to speed up profiling, cleaning, documentation, and code review. Final outputs remain traceable to source data, version-controlled code, and human judgment. Sensitive client data is not handed to external AI tools; the domain knowledge decides what to trust, and every number stays reproducible.
A representative sample of engagements. Clients and figures are generalized under confidentiality; the shape of the work is real.
If any of these sound like the problem on your desk, the shape of the fix is below.
A reform program depended on a monthly reporting cycle assembled by hand from scattered administrative sources — slow, error-prone, and impossible to audit or reproduce.
Rebuilt as a documented R pipeline with a single source taxonomy and versioned outputs, then trained the in-house team to own and extend it.
A monthly cycle that once meant five days of manual assembly now completes in about an hour — and the client's own team owns it.
Headline polls were being read as certainties. The need was a defensible probabilistic view that survived methodological scrutiny.
A Monte Carlo model with house-effect corrections and D'Hondt seat allocation, reporting credible intervals — and publishing the misses alongside the hits.
Forecasts read with honest uncertainty, not false precision — credible intervals readers actually trusted, with every miss published in full.
A reform debate needed to know how localities compared, and what amalgamation would mean in practice, across thousands of administrative units.
A road-distance and accessibility analysis joining administrative, demographic, and spatial data into a single benchmarking layer decision-makers could query.
Thousands of administrative units benchmarked on one queryable layer — a policy debate mapped onto the ground.
The same four steps on every engagement — so what you receive is a working system, not a slide deck and a goodbye.
Start from the decision you're trying to make and work back to the data. If it won't support the claim, I say so early.
Explore and clean the sources — administrative registries, statistical and tax records, electoral formats — and agree a standardization plan before any modelling.
Reproducible R workflows with uncertainty quantified. Every output traces back to documented, version-controlled code.
Your team inherits the pipeline, the documentation, and the training to run it — fully self-sufficient, with no dependence on me to keep it going.
A senior, independent hand on government data from the first scoping call — seventeen years across urban and regional planning, public policy, and data science, first as a planner and analyst, then founding a territorial-analytics practice, now independent.
My work spans public finance, education, territorial reform, elections, public administration, and M&E. I work almost entirely in R — documented pipelines, version control, Shiny, Quarto, GIS — with a bias toward reproducibility: outputs a client's team can re-run without me. I know the administrative data systems, including where they're inconsistent and how they break, which is usually where the real work is, and I'm as comfortable with a policy team as with a technical one.
I publish openly through my data-journalism newsletter, Din date adunate, because the fastest way to show how I reason is to do it in public.
The same forecasting and territorial analysis I do under contract — published openly, with the workings shown and the misses owned. Posts are written in Romanian; an English summary sits under each title.
Two days out, a 60% favourite turns into a genuine toss-up. What the model saw, and why it stopped being sure.
An honest post-mortem on a forecast that missed, the polls behind it, and why getting it wrong is part of the method.
National exam results mapped against where students live — the inequality the averages hide.
Send me the decision, the deadline, the data sources, and the institutional context. I'll tell you where I can help — and what the data can, and can't, support.
contact@alexghita.eu