Reputation: 197
I am reading html table from html file into pandas, and want to get it as a dataframe not a list so that I can perform general dataframe operations.
I am facing error as below whenever I try anything except for printing whole dataframe.
print(dfdefault.shape())
AttributeError: 'list' object has no attribute 'shape'
Upvotes: 2
Views: 13195
Reputation: 4544
Pandas .read_html()
function will return a list of dataframes where each dataframe is a table found on the page. Using StackOverflow's leagues, we can see that there are two tables on the right side of the page. As you can see below, a list is what read_html()
is returning.
url = 'https://stackexchange.com/leagues/1/alltime/stackoverflow'
df_list = pd.read_html(url)
print(df_list)
# [ Rep Change* Users <-- first table
# 0 10,000+ 15477
# 1 5,000+ 33541
# 2 2,500+ 68129
# 3 1,000+ 155430
# 4 500+ 272683
# 5 250+ 429742
# 6 100+ 458600
# 7 50+ 458600
# 8 1+ 458600,
# Total Rep* Users <-- second table
# 0 100,000+ 697
# 1 50,000+ 1963
# 2 25,000+ 5082
# 3 10,000+ 15477
# 4 5,000+ 33541
# 5 3,000+ 56962
# 6 2,000+ 84551
# 7 1,000+ 155430
# 8 500+ 272683
# 9 200+ 458600
# 10 1+ 10381503]
print(len(df_list))
# 2
From here, you just need to specify which table you want to work with. If there's only one table, it's pretty easy to figure out which one to use.
df = df_list[0]
print(df)
# Rep Change* Users
# 0 10,000+ 15477
# 1 5,000+ 33541
# 2 2,500+ 68129
# 3 1,000+ 155430
# 4 500+ 272683
# 5 250+ 429742
# 6 100+ 458600
# 7 50+ 458600
# 8 1+ 458600
print(df.shape)
# (9, 2)
Upvotes: 7