Martin598
Martin598

Reputation: 1601

Pandas - Reading HTML

I am trying to convert this table into a pandas DataFrame

I have done the following so far

import pandas as pd

url = 'http://www.scb.se/sv_/Hitta-statistik/Statistik-efter-amne/Befolkning/Befolkningens-sammansattning/Befolkningsstatistik/25788/25795/Helarsstatistik---Riket/26046/'

df = pd.read_html(url,thousands=' ')
df2= df[0]

My problem here is that pandas do not recognize that the index value 0 are the headers. I also want the column value År to be the index value.

Lastly, I would like to plot the Folkmängd column values as Y and the År values as X, in a line-plot.

Thank you in advance.

Upvotes: 4

Views: 6953

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180411

This should be close to what you want:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')

url = 'http://www.scb.se/sv_/Hitta-statistik/Statistik-efter-amne/Befolkning/Befolkningens-sammansattning/Befolkningsstatistik/25788/25795/Helarsstatistik---Riket/26046/'

table = pd.read_html(url,thousands=' ', header=0, index_col=0)[0]
table["Folkmängd"].plot(color='k')
plt.show()

Which should give you something like:

enter image description here

Upvotes: 3

Related Questions