Welcome to Mexico City. Grab your rental car and hit the road to start your two-week road trip. The itinerary suggested by Lonely Planet is amazing: 10 cities in 15 days. Each one of them is no more than a 4-hours drive from the other. What you may not be aware of is that following that itinerary by car requires complying with up to 12 different sets of traffic laws and regulations. A dozen legal instruments regulate your journey.

As a federation, regulating transit in Mexico is a responsibility of subnational governments: the Mexican states and Mexican municipalities. The Lonely Planet route takes you through six or more states; each one has its own state law and transit code. Moreover, some municipalities along the route also have their own transit code.

More than an annoyance for international travelers, this byzantine system of regulation creates regularly occurring corruption risks for Mexican citizens. In metropolitan areas, a daily routine can imply driving through two or three different municipalities: going downtown for work, picking up kids at school, or having a business dinner at a trendy neighborhood of the city. According to Transparencia Mexicana’s National Index on Corruption and Good Governance, avoiding getting a ticket for a transit violation is the most frequent act of corruption in the country.

Under the TESTING 1 2 3: Innovation Fund of Global Integrity, Transparencia Mexicana proposed to tackle this problem by analyzing the similarities and differences between transit codes in Mexican cities. Although we are still in the first stage of the project, we identified 18 different transit regulations governing just six metropolitan areas. Mexico has 59 metropolitan areas, covering almost 60% of the population. If the trend continues for each one of these areas, it could be estimated that a driver could potentially face 150 different ways of interacting with a police officer. That creates plenty of room for corruption and police extortion.

-- Eduardo Bohórquez and Rafael García, Transparencia Mexicana

The Path to VoxPolitico

A few months ago we started with a project that aimed to convert speeches from the Macedonian Parliament (represented in textual form) into a meaningful format that would be understandable by a more general audience. After months of hard work, on the 4th of November we arrived at a point where we could present the fruits of our labor on

The website combines text-mining algorithms with visualization techniques in order to convert every speech in the Macedonian Parliament into a set of visual clues. The goal is for non-experts to be able to understand what the legislature was focused on in a given period of time and how individual politicians “changed their stripes” (or not) as the political winds in the country shifted. Users now can see what has been spoken in every session of the Macedonian parliament going back nearly twenty years. They can also analyze trends and identify “hot” topics raised by lawmakers in speeches and debates. Fundamentally, what we’re trying to do is demystify the often-lofty political rhetoric that dominates the standard vocabulary of the Macedonian (and other) legislatures.


VoxPolitico consists of two main parts: the web crawler and the website.

We have implemented a web crawler that runs in the background and continuously monitors the official website of the Macedonian parliament. Whenever a new transcript of a parliamentary session is published, the crawler will download the file and process the information. Meaningful information extracted from these transcripts is organized and stored in two different database management systems (a relational and a NoSql database).

The information that we are extracting has a statistical nature: descriptive statistics, relationships, rules, and time series data. This kind of information is presented to the general audience using visual clues in the form of graphs and shapes on the user friendly front-end website. To build the front-end we implemented a feedback-driven development process. Prospective users evaluated all our visuals; their feedback was used to alter our presentation methods to better suit their needs. For this purpose we presented the early system to large audiences in the form of workshops, seminars, as well as international conferences, all the while gathering feedback to improve the platform.

What information the system presents

Our aim was to develop a system that would promote transparency in the Macedonian political ecosystem. We envisioned VoxPolitico providing the following information to the general audience:

  • General statistics that summarize each parliamentary session of the Macedonian parliament. This way we can present the most popular topic being discussed, the phrases used most often, as well as the most active and passive members of the legislature (according to volume of speech).
  • Trends for every speech. We have collected information such as the exact date of the speech, the name of the representative giving the speech, word and phrase frequency, etc. This kind of information allows us to generate trends in the form of “hot” topics being invoked or debated in given periods of time in Parliament. Additionally one can execute comparative statistics against a set of topics (e.g. looking for trends involving more than one word).
  • Similarities between members of the legislature. A very interesting feature of our system is the ability to evaluate the similarity between representatives according to their political speech in Parliament. By applying cosine similarity against document vectors, we can determine the similarity between two politicians based on the speeches they have delivered in Parliament during their career.  

What we do not show

When calculating the statistical properties of the downloaded documents, we realized that the statistics were skewed towards some frequently occurring words mentioned in Parliamentary debate. For example, the phrase “thank you” appears in every speech and thus has a very high frequency, but it is meaningless for our purposes. To overcome this issue we adopted a two-step approach:

- We eliminated words based on their importance using the well-known Term Frequency - Inverse Document Frequency (tf-idf) algorithm; and

- We created a blacklist of words that are simply not processed by VoxPolitico; this helps us eliminate words that we deem to be unimportant or marked as unimportant by our users.

As previously mentioned, we built the system to increase accountability and transparency in the Macedonian Parliament. During some of the early public presentations we organized to share the platform, it was not a surprise that people wanted to know whether VoxPolitico would also process and visualize the voting records of representatives alongside their speeches. Processing voting records would be indeed an enhancement to our system but would require a separate major effort. Parliamentary voting records in Macedonia are maintained only in hard copy by both the Parliament’s archive as well as the national library.

How much data is there?

We have processed data going back to the first parliamentary session of the independent Macedonian state in 1999; the first document recorded in VoxPolitico dates to January 8th, 1991. To-date, more than 20,000 speeches have been retrieved containing more than 100,000 distinct words. Some of VoxPolitico’s tables containing the statistical properties of our data have up to 20 million records.

One mistake we made early on was to overlook the challenges of big data. We believed that a standard relational database would suffice to store all the information. We were quickly proven wrong; the website would take forever to display the information we had stored. Therefore, a whole redesign of the platform was needed, and we substituted non-relational databases for the standard relational one. A separate technical post will cover these issues and how we overcame them.           

What next?

One of our major goals was to create an architecture and a platform that could be used by anyone, anywhere, to generate similar insights for any legislature in the world. With that in mind, our next task is to publish VoxPolitico using a suitable open source license and to make it available to anyone. For this to work we will be writing technical documentation and eventually uploading the final software package to an open source software repository. However, we are excited that non-governmental organizations from neighboring countries have already reached out to us to express their interested in implementing VoxPolitico in their countries. We are already working with them to support the set up of the system, assisting with the implementation of custom parsers to scrape their legislatures’ websites, and offering limited hosting infrastructure. We’re also busy working to localize the system in other languages, starting with Albanian (the second official language in Macedonia).

-- Visar Shehu, South East European University

Building in Belgrade: A Case Study in Freedom of Information Helping Open Data


Our TESTING 1 2 3 Innovation Fund project in the capital city of Belgrade, Serbia aims to make urban planning procedures and practices more accessible to citizens via a web portal. Unsurprisingly, our project and its theory of change depend heavily on the availability of urban planning documents in electronic form. This post explores how our simple need for those digital documents has manifested itself into a textbook case of how the freedom of information and open data communities can work together to advance a common transparency and participation agenda.

Building in Belgrade

The current situation in Belgrade with respect to the procedures for submitting and receiving approval for urban plans is grim. Plans must be presented in person to the City Department for Urban Planning and Construction in analog form during working hours, and citizens can only submit complaints about the plans in written form. All of this minimizes the chances for the broader public to become involved in expressing their interests and opinions around the future of their neighborhoods. It also increases the opportunity for potential acts of corruption in the urban planning process: construction companies, their investors, and senior government officials are the only ones with the skills and experience necessary to navigate this highly technical and arcane process. Opportunities for collusion are rife.

Besides urban plans themselves, which consist of graphics and text components, other extremely important documents are the minutes of the meetings of the City Planning Commission, the forum in which a draft plan is adopted and put out for public comment. Those minutes contain the ostensible rationale for pursuing the project to begin with but often hide important details about special interests backing the development project.

Asking for the Plans

Bearing all this in mind, we knew that we would have a difficult time obtaining these key documents from the respective city institutions. We expected resistance and saw two paths before us. The first was a systematic solution aimed at developing a long-term collaboration with the City Department for Urban Planning and Construction and securing their agreement to publish all of the key documents in electronic format. The other choice was more confrontational: we could send Freedom of Information Act requests demanding each individual document. Although Serbia is blessed with having the world’s best theoretical FOI law on paper, the procedure itself is time consuming in practice. The appeals process (including requesting that the Serbian RTI Commissioner intervene to compel disclosure from government bodies) can take many months, often too late to make a difference.

In our case, we first decided to send a friendly letter to the relevant city department. In the letter, we have briefly presented our project and asked for an appointment to discuss possible cooperation. After a few days of silence and a few phone calls with officials from the department (which were not promising), we received a formal written response. The content of the response letter was such that it can hardly be characterized as an actual answer, since it consisted solely of quotes from city procedures and regulations and did not contain even one sentence in any way responding to our actual questions and requests. While we had anticipated a negative response, we were surprised by the bizarre nature of the rejection.

Thus, we were forced to submit FOI requests for digital versions of the ten currently pending major urban plans in Belgrade together with the minutes of the corresponding meetings of the City Planning Commission. After three weeks of waiting we received another bizzare response arguing that we had no right to ask for these documents in electronic format because they were already publicly available in analog form. This is true, but only partially so. The reason is that only the graphical portion of the urban plans is available for public scrutiny in analog form. But the accompanying text document , which is typically up to 50 pages in length and is crucial for the understanding the true nature of the plan, is not publicly available. Nor are the minutes of the meeting of the City Planning Commission.

Next Steps

After receiving the latest response from the city department, we sent a complaint to the FOI Commissioner asking him to directly intervene to compel digitial disclosure of the requested information.

Meanwhile, parallel to this correspondence with the city government, a new front opened in the battle for availability of the data. The Ministry for Construction prepared and put out for public comment a draft law on urban planning and construction, which, among other things, defines a potentially updated procedure for the submission and adoption of urban plans. Unfortunately, the draft law does not foresee any significant changes to the rules governing how this information is to be published digitally. On the other hand there is some indication that, in order to adhere to the country’s overarching anti-corruption startegy, the new law must require urban plans to be publicly accessible in digital format.

We are using that ambiguity and opening to launch a mini-advocacy campaign targeting  various public officials who are responsible for finalizing the new law and ensuring its adherence to Serbia’s anti-corruption strategy. Our hope is to codify the requirement for digital disclosure of urban plans under law and avoid a prolonged FOI fight with the city government.

-- Ivan Branisavljevic, Ministry of Space Collective