Durious
Durious

Reputation: 1

Python - BeautifulSoup - Extracting Table Data with tags stuck

In python I am trying to take a table from an HTML file and then store those table attributes in a list so I can then make comparison in table data that is changed. I was able using mechanize to automate the download of the HTML page that was behind a ID\Password login but the second part of placing the data into lists is having the output come out as below with the tags in place. So while it appears I have solved the issue of storing the data, I'm uncertain how to remove the tags prior to passing the data in?

Link to HTML Document: that I am trying to pull data from: https://www.dropbox.com/s/b684ecl7b2l3m10/guildwar.html?dl=0

Sample Output: (TOP PART), code starts at from bs4

[None, None, None, <td class="t1"> 1 </td>, <td class="t1"> 2 </td>,       <td class="t1"> 3 </td>]




from bs4 import BeautifulSoup

soup = BeautifulSoup(open("guildwar.html"))

rank_0 = []
color_1 = []
name_2 = []
land_3 = []
fortress_4 = []
power_5 = []


for el in soup.findAll('tr'):
    rank = el.find('td', {'class':'t1'})
    rank_0.append(rank)
    color = el.find('td', {'class':'t2'})
    color_1.append(color)
    name = el.find('td', {'class':'t3'})
    name_2.append(name)
    land = el.find('td', {'class':'t4'})
    land_3.append(land)
    fortress = el.find('td', {'class':'t5'})
    fortress_4.append(fortress)
    power = el.find('td', {'class':'t6'})
    power_5.append(power)

print("Ranking")
print(rank_0)
print("\nMagic Color")
print(color_1)
print("\nMage Name")
print(name_2)
print("\nLand")
print(land_3)
print("\nFortress")
print(fortress_4)
print("\nPower")
print(power_5)

===============================

Upvotes: 0

Views: 62

Answers (1)

Anzel
Anzel

Reputation: 20553

You can use text attribute on the element, like this:

In [2]: s = '<tr><td class="t1"> 1 </td>, <td class="t1"> 2 </td>,       <td class="t1"> 3 </td></tr>'

In [4]: soup = BeautifulSoup(s, "lxml")

In [5]: for el in soup.findAll('tr'):
   ...:     rank = el.find('td', {'class': 't1'})
   ...:     print("Ranking > ", rank.text) # use text attribute
   ...:     
Ranking >   1 

On a side note, I would probably store the whole <table> and compare if it changes over time, then you save time comparing all individual column... and only store the data if there is an update/change.

Upvotes: 1

Related Questions