Vvishvesh
Vvishvesh

Reputation: 11

How to extract data from a table from a web page using Beautiful Soup

I want to extract the data from the table given in 'https://statisticstimes.com/demographics/india/indian-states-population.php' and put it in a list or a dictionary.

I am a beginner in Python. From what I have learned so far all I could do is:

import urllib.request , urllib.error , urllib.parse

from bs4 import BeautifulSoup

url = input("Enter url: ")

html = urllib.request.urlopen(url).read()

x = BeautifulSoup(html , 'html.parser')

tags = x('tr')

lst = list()

for tag in tags:
    lst.append(tag.findAll('td'))

print(lst)

Upvotes: 1

Views: 39

Answers (1)

baduker
baduker

Reputation: 20052

You can use requests and pandas.

Here's how:

import pandas as pd
import requests
from tabulate import tabulate

url = "https://statisticstimes.com/demographics/india/indian-states-population.php"
df = pd.read_html(requests.get(url).text, flavor="bs4")[-1]
print(tabulate(df.head(10), showindex=False))

Output:

---  ----------------  --------  --------  -------  -----  ----  --------------------  ---
NCT  Delhi             18710922  16787941  1922981  11.45  1.36  Malawi                 63
18   Haryana           28204692  25351462  2853230  11.25  2.06  Venezuela              51
14   Kerala            35699443  33406061  2293382   6.87  2.6   Morocco                41
20   Himachal Pradesh   7451955   6864602   587353   8.56  0.54  China, Hong Kong SAR  104
16   Punjab            30141373  27743338  2398035   8.64  2.2   Mozambique             48
12   Telangana         39362732  35004000  4358732  12.45  2.87  Iraq                   36
25   Goa                1586250   1458545   127705   8.76  0.12  Bahrain               153
19   Uttarakhand       11250858  10086292  1164566  11.55  0.82  Haiti                  84
UT3  Chandigarh         1158473   1055450   103023   9.76  0.08  Eswatini              159
9    Gujarat           63872399  60439692  3432707   5.68  4.66  France                 23
---  ----------------  --------  --------  -------  -----  ----  --------------------  ---

With:

df.to_csv("your_table.csv", index=False)

you can dump the table to a .csv file:

enter image description here

Upvotes: 1

Related Questions