Reputation: 6040
I have a HTML table stored in a file. I want to take each td value from the table which has the attribute like so :
<td describedby="grid_1-1" ... >Value for CSV</td>
<td describedby="grid_1-1" ... >Value for CSV2</td>
<td describedby="grid_1-1" ... >Value for CSV3</td>
<td describedby="grid_1-2" ... >Value for CSV4</td>
and I want to put it into a CSV file, with each new value taking up a new line in the CSV.
So for the file above, the CSV produced would be :
Value for CSV
Value for CSV2
Value for CSV3
Value for CSV4 would be ignored as describedby="grid_1-2", not "grid_1-1".
So I have tried this, however no matter what I try there seems to be (a) a blank line in between each printed line (b) a comma separating each char.
So the print is more like :
V,a,l,u,e,f,o,r,C,S,V,
V,a,l,u,e,f,o,r,C,S,V,2
What silly thing have I done now?
Thanks :)
import csv
import os
from bs4 import BeautifulSoup
with open("C:\\Users\\ADMIN\\Desktop\\test.html", 'r') as orig_f:
soup = BeautifulSoup(orig_f.read())
results = soup.findAll("td", {"describedby":"grid_1-1"})
with open('C:\\Users\\ADMIN\\Desktop\\Deploy.csv', 'wb') as fp:
a = csv.writer(fp, delimiter=',')
for result in results :
a.writerows(result)
Upvotes: 0
Views: 129
Reputation: 10213
use lxml
and csv
module.
td
text value which attribute describedby
have value grid_1-1
by xpath()
method of lxml.csv
file in write mode.writerow()
methodcode:
content = """
<body>
<td describedby="grid_1-1">Value for CSV</td>
<td describedby="grid_1-1">Value for CSV2</td>
<td describedby="grid_1-1">Value for CSV3</td>
<td describedby="grid_1-2">Value for CSV4</td>
</body>
"""
from lxml import etree
import csv
root = etree.fromstring(content)
l = root.xpath("//td[@describedby='grid_1-1']/text()")
with open('/home/vivek/Desktop/output.csv', 'wb') as fp:
a = csv.writer(fp, delimiter=',')
for i in l :
a.writerow([i, ])
output:
Value for CSV
Value for CSV2
Value for CSV3
Value for CSV4
Upvotes: 1
Reputation: 180391
If result is a string inside a list you need to wrap it in a list as writerows expects an iterable of iterables and iterates over the string:
a.writerows([result]) <- wrap in a list
In your case you should use writerow and extract the text from each td tag in results:
a.writerow([result.text]) # write the text from td element
You have all the td tags in your result list so you just need extract the text with .text.
Upvotes: 3