Reputation: 9348
(Environment: Python 2.7.6 Shell IDLE + BeautifulSoup 4.3.2 + )
I want to pick up some texts from a batch of files (about 50 files), and put them nicely into an Excel file, either row by row, or column by column.
The text sample in each file contains below:
<tr>
<td width=25%>
Arnold Ed
</td>
<td width=15%>
18 Feb 1959
</td>
</tr>
<tr>
<td width=15%>
男性
</td>
<td width=15%>
02 March 2002
</td>
</tr>
<tr>
<td width=15%>
Guangxi
</td>
</tr>
What I so far worked out are being shown below. The way is to read the files one by one. The codes run fine until the texts pickup part, but they are not writing into the Excel file.
from bs4 import BeautifulSoup
import xlwt
list_open = open("c:\\file list.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")
for each_file in line_in_list:
page = open(each_file)
soup = BeautifulSoup(page.read())
all_texts = soup.find_all("td")
for a_t in all_texts:
a = a_t.renderContents()
#"print a" here works ok
book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('namelist', cell_overwrite_ok = True)
sheet.write (0, 0, a)
book.save("C:\\details.xls")
Actually it’s only writing the last piece of texts into the Excel file. So in what way I can have it correctly done?
With laike9m's help, the final version is:
list_open = open("c:\\file list.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")
book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('namelist', cell_overwrite_ok = True)
for i,each_file in enumerate(line_in_list):
page = open(each_file)
soup = BeautifulSoup(page.read())
all_texts = soup.find_all("td")
for j,a_t in enumerate(all_texts):
a = a_t.renderContents()
sheet.write (i, j, a)
book.save("C:\\details.xls")
Upvotes: 0
Views: 155
Reputation: 19368
You didn't put the last four lines into for
loop. I guess that's why it’s only writing the last piece of texts into the Excel file.
EDIT
book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('namelist', cell_overwrite_ok = True)
for i, each_file in enumerate(line_in_list):
page = open(each_file)
soup = BeautifulSoup(page.read())
all_texts = soup.find_all("td")
for j, a_t in enumerate(all_texts):
a = a_t.renderContents()
sheet.write(i, j, a)
book.save("C:\\details.xls")
Upvotes: 1