How to navigate a website and extract data with Python

Question

I am not much of a programmer. Just learning. I want to extract (public) electoral data from my country's electoral Authority using Python. This is for academic purposes but I also want to develop my programming skills. All of the data I store will be posted publicly, of course.

I need to know which python modules allow me to enter websites and read the HTML to recognize certain data which I need to collect. I just hope for some guidelines on how to, or any additional suggestions anyone has.

I wish o extract votes for each party and additional data presented completely deaggregated: State/Municipality/County/Center/Table. Finally, I hope to store it in a csv or xlsx (I guess I'd use openpyxl or xlsxwriter).

My idea is to make a program that:

1) Takes the link input (e.g.);

2) It identifies the links for every State on the left of the HTML (Amazonas, Anzoategui, and so on);

3) For loop though each state and finds the url (it's a HTML so I guess it'll search & extract the tag, right?) for each State;

4) Repeats with municipalities;

4) Repeats with "Parroquia" (county);

5) Repeats for every voting center;

6) Finally for every voting table in each center (1, 2, 3... whatever);

7) Next it stores the result for every party (eg. manually I'd press the name of every candidate, recognize the LOGO of the party and store its votes (30 in the example)). And it also should store the data from the "technical table" at the end.

The final result should be to store all the data: State, Municipality, County, Center, Table, and the result for each party.

How to navigate a website and extract data with Python

Answers (1)

Related Questions