Election Data Transparency: From Public Results to Forensic-Ready Democracy – Part I

Reading Time: 11 minutes

1. Election data are democratic infrastructure

Election results are often treated as the final line of an electoral process: votes are counted, winners are announced, complaints are filed or dismissed, and political life moves on. But in a modern democracy, the publication of election results is not merely an administrative ritual. It is part of the democratic infrastructure itself. Citizens, parties, observers, journalists, courts, and researchers cannot meaningfully evaluate an election unless they can see what happened at the level where the election actually took place: the polling station.

This requirement is especially important in weak, hybrid, or partially democratic systems. Stable democracies can sometimes tolerate decentralised reporting, slower publication, or less centralised polling-station-level data because public trust has been built over many electoral cycles. Low-trust systems do not have that luxury. Where ruling parties dominate institutions, where media are polarised, and where citizens suspect that official results may be manipulated, election authorities must provide more transparency than is normally required in consolidated democracies, not less. Trust is not restored by asking citizens to believe official announcements. It is restored by allowing them to verify the numbers.

That is why the format and timing of election data matter. A scanned protocol uploaded days later may formally support the claim that “the data are public”, but it does not allow rapid analysis. A PDF image cannot easily be imported into R or Python. It cannot be checked within minutes for unusual turnout patterns, implausible vote shares, inconsistent invalid-ballot rates, or discrepancies between registered voters and ballots cast. By contrast, a clean polling-station-level CSV file, published soon after polling stations close, allows election observers and analysts to move from suspicion to evidence.

This first part of the series explains what good election-data transparency actually means. It is not enough for election data to be visible. They must be timely, machine-readable, documented, and usable. In other words, they must be ready for forensic analysis. The question is not whether technology should replace trust. The question is whether societies with fragile electoral trust should continue to accept election data in forms that prevent timely public verification.

2. Public visibility is not the same as forensic usability

The first conceptual distinction is simple but often neglected: publicly visible data are not necessarily usable data.

An election commission may publish results on a website. It may show charts, maps, percentages, and party rankings. It may upload scanned polling-station protocols. It may even publish hundreds or thousands of files. Yet this does not automatically mean that the data are suitable for forensic analysis.

Forensic usability requires more than visibility. It requires that data can be downloaded, imported, checked, merged with other variables, and analysed quickly. A researcher or observer should be able to ask: How does turnout vary across polling stations? Are there unusually high turnout clusters? Do vote shares for a ruling party rise systematically with turnout? Are invalid ballots concentrated in particular municipalities? Are registered voters, ballots cast, valid votes, and invalid votes internally consistent? Do the results differ from historical patterns?

These questions require polling-station-level data in a structured format. A scanned protocol may be valuable as legal evidence. It may show the signatures of polling-board members. It may provide a documentary trail. But it is not a substitute for a clean dataset. A country can therefore have documentary transparency without having forensic transparency.

This distinction is central for Serbia and similar cases. If the only available source is a photographed or scanned protocol, observers must first download thousands of files, read or OCR them, manually correct errors, standardise names, reconstruct polling-station identifiers, and only then begin analysis. By the time this work is completed, complaint deadlines may already have expired. In that situation, transparency exists too late and in the wrong format.

The strongest systems avoid this problem by publishing both forms of evidence: scanned protocols as documentary confirmation, and machine-readable data as analytical infrastructure. The scanned document answers the question: “What was written and signed at the polling station?” The dataset answers the question: “What patterns emerge across all polling stations?” Both are needed, but they serve different purposes.

3. What “good election-data availability” means

Good election-data availability has four layers.

The first layer is timeliness. Data should be available during election day where appropriate, and immediately after polling stations close for results. Turnout updates, reporting progress, preliminary results, and final results should be clearly separated. Early data should be labelled as preliminary; final data should be labelled as certified. Corrections should be documented.

The second layer is granularity. National totals are not enough. Regional totals are not enough. Municipality-level data are useful, but still insufficient for many forensic tools. The basic unit of election forensics is the polling station. That is where ballots are counted, protocols are completed, and irregularities may occur. If polling-station-level data are missing, many diagnostic tools become weaker or impossible.

The third layer is machine-readability. Data should be downloadable in formats such as CSV, clean XLSX, XML, JSON, API output, or well-structured HTML. The critical issue is not the file extension itself, but whether each row and column have a clear meaning. A spreadsheet with merged cells, decorative headings, repeated subtotals, or inconsistent column names may be technically an Excel file but still analytically poor.

The fourth layer is documentation. A dataset should include metadata: definitions of variables, polling-station identifiers, geographic codes, party/list identifiers, turnout definitions, and revision history. Without documentation, analysts can misread the data. For example, “ballots cast”, “votes cast”, “valid votes”, and “voters who voted” may not be identical concepts in every electoral system. The difference matters.

When these four layers are present, timely publication, polling-station granularity, machine-readable format, and documentation, election data become forensic-ready. This does not mean that fraud has occurred, or that forensic analysis will always find anomalies. It means that society has the tools to check the election independently.

4. Live data: useful, but easily misunderstood

Live election data can strengthen transparency, but it must be understood carefully. Live data may include turnout during the day, number of polling stations reporting, preliminary results after counting begins, or real-time updates from an election commission dashboard. It may also include observer-collected data from NGOs or party networks.

Official live dashboards can be powerful. South Africa’s Electoral Commission provides a good example: its results portal states that live election results are displayed on election night and updated every 5–10 minutes after all voting stations close; it also allows users to explore results through maps, tables, charts, and historical dashboards.

But live dashboards can also mislead if they are poorly designed. Early reporting is rarely random. Urban polling stations may report faster than rural ones. Small polling stations may finish counting earlier than large ones. Strongholds of one party may appear before strongholds of another. If a dashboard shows early percentages without clearly showing reporting progress, citizens may misinterpret temporary patterns as final results.

This is why live transparency requires context. A responsible dashboard should show how many polling stations have reported, where they are located, how many registered voters they cover, when the data were updated, and whether the results are preliminary or final. It should also allow users to download the underlying data. A dashboard that only shows attractive graphics but blocks access to the dataset is not enough.

In low-trust systems, live data have an additional role. They reduce the time window in which manipulation can occur unnoticed. If observers, journalists, and parties can compare official updates with polling-station protocols and independent observer reports, suspicious discrepancies can be identified quickly. This is not only a technical advantage; it changes the balance of accountability.

5. Technical prerequisites, explained without jargon

Election-data transparency does not require magical technology. It requires a disciplined information system.

At the base of that system is a stable list of polling stations. Each polling station should have a unique identifier that remains stable across elections or, if changed, is documented. Without stable identifiers, it becomes difficult to compare turnout and vote shares over time.

The next layer is data collection. Results from polling stations must enter a central system. This can happen through secure digital reporting, scanned protocols, manual entry with verification, or a combination of channels. The key is that every number should be traceable back to the polling station protocol.

Then comes validation. A good system checks whether the numbers make sense: whether votes exceed registered voters, whether valid and invalid ballots add up correctly, whether candidate votes sum to total valid votes, and whether missing values are flagged. Validation does not eliminate fraud, but it reduces administrative error and makes anomalies visible.

The fourth layer is publication. Data should be published in a form that ordinary users can inspect and analysts can download. The strongest systems include dashboards, downloadable files, and sometimes APIs. Brazil’s electoral data infrastructure is important in this respect because the Superior Electoral Court maintains an open-data portal with election datasets, including result-related resources such as ballot-box bulletin datasets and CSV files.

The fifth layer is auditability. Every update should have a timestamp. Corrections should be logged. Preliminary and final versions should be preserved. If a number changes, users should be able to see when and why. This protects both citizens and election authorities. It reduces suspicion because it makes the process visible.

Finally, cybersecurity matters, but it should not be used as an excuse for secrecy. Secure systems and open data are not opposites. A system can protect internal transmission channels while still publishing public results in a usable format. Indeed, secrecy about public results may increase suspicion rather than reduce risk.

6. Legal prerequisites: transparency must be mandatory

Technical capacity alone is not enough. If election-data publication depends on the goodwill of officials, it remains fragile. In low-trust systems, transparency must be a legal obligation.

Election laws should specify that polling-station-level results must be published. They should define what variables must be included: registered voters, voters who voted, ballots cast, valid votes, invalid ballots, votes by party/list/candidate, polling-station identifiers, municipality, district, and relevant special categories such as diaspora, postal, early, or absentee voting where applicable.

The law should also define timing. Data should be published early enough for parties, candidates, NGOs, and citizens to use them before complaint and appeal deadlines expire. This point is crucial. Data published after the legal window for contestation closes may be useful for historians, but it is much less useful for electoral accountability.

The law should define format. It should not be enough to publish scanned PDFs or photographs. Those should be published as documentary evidence, but machine-readable data should also be required. A modern standard would require CSV or XLSX files at minimum, and preferably APIs or structured open-data portals.

The law should also define equality of access. Election data should not be available only to parties with privileged contacts, or to institutions with special arrangements. NGOs, journalists, universities, researchers, parties, and citizens should all have the same access to the same data at the same time.

This is where transparency becomes institutional, not personal. A good system does not depend on whether the current election commission is cooperative. It depends on rules that bind every future commission.

7. NGOs and observer coalitions: from monitoring to data infrastructure

Traditional election observation often focused on visible election-day procedures: opening of polling stations, presence of materials, secrecy of the vote, counting procedures, intimidation, and irregularities. These remain essential. But modern observation increasingly includes data infrastructure.

NGOs and observer coalitions can collect structured information from polling stations using mobile apps, SMS forms, web platforms, or call centres. Observers can report turnout, incidents, protocol figures, photographs of official forms, and counting results. If the system is designed well, observer data can be aggregated quickly and compared with official data.

Parallel Vote Tabulation, or PVT, is one of the most important tools in this field. In a PVT, trained observers collect results from a statistically designed sample of polling stations. If the sample is properly drawn and observers report accurately, the PVT can estimate whether official results are consistent with polling-station evidence. Nigeria’s Yiaga Africa, for example, reported that its 2023 presidential election statement was based on reports from 1,454 of 1,507 sampled polling units, or 97% of the sample, and that its deployment strategy allowed it to independently assess official presidential results.

NGO reporting is not a replacement for official results. It is a cross-check. That distinction is important. The election commission remains responsible for official tabulation. But NGOs can make manipulation more difficult by creating an independent evidence stream. If official data and observer data diverge, the discrepancy itself becomes a public issue requiring explanation.

Kenya’s 2022 election illustrates the importance of public result transmission and visibility in contested environments. The Carter Center’s expert mission focused on election technology and concluded that technology helped enhance transparency, while also recommending additional safeguards such as better preparedness, stronger verification mechanisms, digital signatures, and risk-limiting audits.

For Serbia and similar low-trust systems, this lesson is especially relevant. NGOs should not only observe and publish narrative reports. They should build or support systems capable of collecting structured polling-station data rapidly, storing protocol images, generating public dashboards, and running first-line forensic checks within hours after polls close.

8. Why low-trust systems need more transparency than stable democracies

One of the most important arguments in this series is that the same technical weakness has different political meaning in different contexts.

In a consolidated democracy, citizens may tolerate slower publication or decentralised administration because the system has repeatedly demonstrated credibility. The United States, for example, does not have a single national election authority publishing immediate national polling-station-level results. The United Kingdom traditionally reports results at constituency level. These arrangements may be suboptimal for centralised election forensics, but they operate within political systems where trust has been built over long periods.

In a hybrid or low-trust system, the same arrangement would be much more problematic. If citizens already suspect pressure on voters, media capture, abuse of public resources, inflated turnout, or manipulation during aggregation, then telling them that data will be published later, in scanned form, or only through a non-downloadable interface will not build confidence. It will do the opposite.

For that reason, NGOs and observer coalitions in low-trust systems are justified in asking for more than might be considered necessary in stable democracies. They should ask for polling-station-level data, clean machine-readable files, real-time or near-real-time dashboards, scanned protocols, timestamps, revision logs, APIs, and independent verification channels. This is not an excessive demand. It is a rational response to weak public trust.

The goal is not permanent suspicion. The goal is to create a system that can gradually reduce suspicion. After several election cycles in which data are published quickly, cleanly, and consistently, public trust may improve. Transparency is therefore not only a diagnostic tool. It is also a trust-building mechanism.

9. Serbia and the problem of PDF transparency

Serbia’s problem is not simply whether some election documents are public. The deeper issue is whether election data are published in a way that allows timely, independent, polling-station-level analysis.

A scanned polling-station protocol can be useful. It can show what was signed at the polling station. It can help document disputes. It can be compared with tabulated results. But if the broader dataset is not available in clean, downloadable form before complaint deadlines expire, then election forensics becomes slower than the legal process. That is exactly the wrong order. Evidence should be available while it can still matter.

The Serbian debate should therefore move beyond the minimum question: “Are results published somewhere?” The real question should be: “Can citizens, observers, journalists, and researchers download the polling-station-level data, import them into statistical software, verify the totals, compare them with protocols, detect anomalies, and publish a preliminary forensic report within hours?”

If the answer is no, then the system is not forensic-ready.

10. Conclusion: from announced results to verifiable results

Election transparency should no longer be measured only by whether results are announced. In the twenty-first century, the standard must be higher. Results should be public, timely, granular, machine-readable, documented, and independently checkable.

This does not mean that every anomaly proves manipulation. Election forensics is not a machine for declaring fraud. It is a set of tools for identifying patterns that require explanation. But those tools cannot work without data.

For consolidated democracies, forensic-ready data would modernise transparency. For hybrid and low-trust systems, it is more urgent: it is a mechanism for rebuilding public confidence. Citizens who do not trust institutions should not be asked simply to believe official results. They should be given the data needed to verify them.

The next part of this series will compare selected countries from Latin America, Africa, Europe, and established democracies. The question will not be which country is “more democratic” in general. The question will be more precise: which countries publish election data in a form that allows rapid independent verification, and what Serbia can learn from them.

Internet resources

Brazil: https://dadosabertos.TSE.jus.br/dataset/

Kenya – Carter Center: https://cartercentee50c07c05.blob.core.windows.net/blobcartercentee50c07c05/wp-content/uploads/2023/03/kenya-2022-elections-final-report.pdf

Nigerija/Nigeria: https://wilanglobal.org/wp-content/uploads/2023/08/Yiaga-Africa-Post-Election-Statement-on-2023-Presidential-Election.pdf

South Africa: https://results.elections.org.za/

Tech Appendix: A short guide for non-specialists

Scanned PDF or image-based protocol
This is a photograph or scan of a polling-station document. It is useful as documentary evidence, but it is not immediately usable for statistical analysis. Someone must read or extract the numbers first.

Searchable PDF
This is better than a scanned image because text can be selected or searched. However, it is still usually weaker than a proper dataset because tables may not import cleanly into statistical software.

Messy Excel file
An Excel file is not automatically good data. If it contains merged cells, decorative headings, repeated subtotals, inconsistent names, or multiple tables on one sheet, it may require substantial cleaning.

Clean CSV/XLSX
This is a proper rectangular dataset. Each row is one polling station. Each column is one variable. It can usually be imported directly into R, Python, Stata, SPSS, or Excel.

JSON/API access
This allows software to retrieve data automatically from a web service. It is especially useful for live dashboards, repeated updates, and reproducible monitoring.

XML
XML is another structured data format. It is machine-readable and can be useful when properly documented, although it may be less familiar to general users than CSV or Excel.

Structured open-data portal
This is a public website where datasets are searchable, downloadable, documented, and reusable. Good portals include metadata, update dates, licences, and clear dataset descriptions.

Metadata
Metadata are “data about the data”: definitions of variables, codes, dates, file versions, polling-station identifiers, and explanations of how the dataset was produced.

Timestamp
A timestamp shows when data were uploaded or updated. It is essential for tracking preliminary results and corrections.

Revision log
A revision log records changes. If a result is corrected, the public should be able to see what changed, when, and why.Open-data licence
A licence explains whether citizens, journalists, NGOs, and researchers are legally allowed to reuse, analyse, and republish the data.

Post Views: 28

Forenzika izbora

Election Data Transparency: From Public Results to Forensic-Ready Democracy – Part I

Komentariši Poništi odgovor