Steven González
Steven González

Reputation: 257

Adding href to panda .read_html DF

I want to create a table with the information available on this website. I want the table to have 3 columns: 0 series/date, 1 title and 2 links. I already managed to get the first two columns but I don't know how to get the link for each entry.

import pandas as pd
import requests
url = "http://legislaturautuado.com/pgs/resolutions.php?st=5&f=2016"
r = requests.get(url)
df_list = pd.read_html(r.text)
df = df_list[0]
df.head()

output

Will it be possible to get what I want by only using pandas?

Upvotes: 2

Views: 1948

Answers (1)

As far as I know, it's not possible with pandas only. It can be done with BeautifulSoup, though:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "http://legislaturautuado.com/pgs/resolutions.php?st=5&f=2016"
r = requests.get(url)

html_table = BeautifulSoup(r.text).find('table')
r.close()

df = pd.read_html(str(html_table), header=0)[0]
df['Link'] = [link.get('href') for link in html_table.find_all('a')]

Upvotes: 4

Related Questions