panda read_html ignore
and concatenate strings

Question

trying to collect the tables from here : https://en.wikipedia.org/wiki/List_of_English_monarchs as follow:

 import pandas as pd
 url = "https://en.wikipedia.org/wiki/List_of_English_monarchs"
 spacer = lambda s: s.replace('\xa0', ' ').replace('[q]', ' ').replace('\u2009',' ')

 dfs = pd.read_html(url,attrs={"class":'wikitable'},converters={'Name':spacer,
                                                               'Birth':spacer,
                                                               'Marriages':spacer,
                                                               'Death':spacer})

It works really good except it seems that when there is a
text
it is not adding a whitespace, for example the first item in the first column "Name":

'Edward the Elder26 October 899–17 July 924(24 years, 266 days)'
where it should have been
'Edward the Elder 26 October 899 – 17 July 924 (24 years, 266 days)'

the end goal is to have the ability to extract the dates from that column

Jack Fleeting · Accepted Answer

Maybe something like this:

kings = requests.get(url)
df = pd.read_html(kings.text.replace('
',' '))
#using the first column as example
print(df[0]['Name'])

Output :

0                                               Alfred the Great (King of Wessex from 871) c. 886 – 26 October 899
1                                               Edward the Elder 26 October 899 – 17 July 924 (24 years, 266 days)

panda read_html ignore <br> and concatenate strings

Answers (1)

Related Questions

panda read_html ignore &lt;br&gt; and concatenate strings

Answers (1)

Related Questions

panda read_html ignore <br> and concatenate strings