Reputation: 151
I am new to python and trying to download the countries GDP per capita data. I am trying to read the data from this website: https://worldpopulationreview.com/countries/by-gdp
I tried to read the data but, I found no tables found error.
I can see the data is in r.text
but somehow pandas can not read that table.
How to solve the problem and read the data?
import pandas as pd
import requests
url = "https://worldpopulationreview.com/countries/by-gdp"
r = requests.get(url)
raw_html = r.text # I can see the data is here, but pd.read_html says no tables found
df_list = pd.read_html(raw_html)
print(len(df_list))
Upvotes: 0
Views: 293
Reputation: 25073
Data is embedded via <script id="__NEXT_DATA__" type="application/json">
and rendered by browser only, so you have to adjust your script a bit:
pd.json_normalize(
json.loads(
BeautifulSoup(
requests.get(url).text
).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)
import pandas as pd
import requests,json
from bs4 import BeautifulSoup
url = "https://worldpopulationreview.com/countries/by-gdp"
df = pd.json_normalize(
json.loads(
BeautifulSoup(
requests.get(url).text
).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)
df[['continent', 'country', 'pop','imfGDP', 'unGDP', 'gdpPerCapita']]
continent | country | pop | imfGDP | unGDP | gdpPerCapita | |
---|---|---|---|---|---|---|
0 | North America | United States | 338290 | 2.08938e+13 | 18624475000000 | 61762.9 |
1 | Asia | China | 1.42589e+06 | 1.48626e+13 | 11218281029298 | 10423.4 |
... | ... | ... | ... | ... | ... | ... |
210 | Asia | Syria | 22125.2 | 0 | 22163075121 | 1001.71 |
211 | North America | Turks and Caicos Islands | 45.703 | 0 | 917550492 | 20076.4 |
Upvotes: 1