Reputation: 325
I wanted to get the info on a Wikipedia table to a csv file. While searching, I found the code in this page which was using BeautifulSoup to get the table items to a file.
Little differently, I only wanted to get the info to a file on my computer. I wanted to get the table from this wiki page. I ended up with this code:
from bs4 import BeautifulSoup
import urllib2
wiki = "https://en.wikipedia.org/wiki/List_of_minor_planets:_1001%E2%80%932000"
header = {'User-Agent': 'Mozilla/5.0'} #Needed to prevent 403 error on Wikipedia
req = urllib2.Request(wiki,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
Name = ""
designation = ""
date = ""
site = ""
discoverer = ""
table = soup.find("table")
f = open('output.csv', 'w')
for row in table.findAll("tr"):
cells = row.findAll("td")
#For each "tr", assign each "td" to a variable.
if len(cells) == 5:
Name = cells[0].find(text=True)
designation = cells[1].findAll(text=True)
date = cells[2].find(text=True)
site = cells[3].find(text=True)
discoverer = cells[4].find(text=True)
for x in range(len(site)):
write_to_file = (site + ";" + Name + ";" + designation + ";" +
date + ";" + discoverer + "\n")
print write_to_file
f.write(write_to_file)
f.close()
The only differences are, that I don't have a "sortable table" so I removed that part from the code and I have 5 columns.
However the code returns the following error:
TypeError: coercing to Unicode: need string or buffer, ResultSet found
I believe it is related to "\n" in the code, that's where I get the error.
What do you think this problem is and how can I get over it?
Upvotes: 1
Views: 2122
Reputation: 1
By converting write_to_file
with str(write_to_file)
helped in print
and f.write
.
Upvotes: -2
Reputation: 2896
It's not related to the '\n', the culprit is this line:
designation = cells[1].findAll(text=True)
Notice how this line uses findAll
while the others use find
.
findAll
returns a list (actually a ResultSet
), even if it finds only a single occurrence. Later, when you're building the write_to_file
string, it raises an error when you try to concatenate the partial string and designation
(which is a ResultSet
).
Replace findAll
with find
and it works (except for eventual encoding errors)
Upvotes: 1