Reputation: 3716
I have a list of beautiful soup object that i am trying to further parse for contents of cells. My output becomes a list of lists with 3 items on each since the table had 3 columns.
file = <html><p><center><h1> Interference Report </h1></center><p>
<b> Interference Report Project File: </b>C:\Users\ksobon\Documents\test_project_03_ksobon.rvt <br> <b> Created: </b> Monday, May 26, 2014 7:52:32 PM <br> <b> Last Update: </b> <br>
<p><table border=on> <tr> <td></td> <td ALIGN="center">A</td> <td ALIGN="center">B</td> </tr>
<tr> <td> 1 </td> <td> Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469021 </td> <td> Workset1 : Furniture : FUR_BoardroomTable10Chairs_gm : Board Room Layout : id 482259 </td> </tr>
<tr> <td> 2 </td> <td> Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469021 </td> <td> Workset1 : Walls : Basic Wall : E103-CON 100mm : id 483442 </td> </tr>
<tr> <td> 3 </td> <td> Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469060 </td> <td> Workset1 : Furniture : FUR_Sofa_gm : 2100mm : id 475041 </td> </tr>
<tr> <td> 4 </td> <td> Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469109 </td> <td> Workset1 : Furniture : FUR_Sofa_gm : 2100mm : id 475273 </td> </tr>
<tr> <td> 5 </td> <td> Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469178 </td> <td> Workset1 : Furniture : FUR_Sofa_gm : 2100mm : id 475510 </td> </tr>
<tr> <td> 6 </td> <td> Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469178 </td> <td> Workset1 : Furniture : FUR_Sofa_gm : 2100mm : id 482306 </td> </tr>
<tr> <td> 7 </td> <td> whatever : Doors : DOR_Single_gm : 800w, 2100h (720Leaf) - Mark 102B : id 472052 </td> <td> Workset1 : Windows : WIN-ConceptWindowFixed_gm : 1200 H x 1200 W - Mark 102B : id 472822 </td> </tr>
<tr> <td> 8 </td> <td> whatever : Doors : DOR_Single_gm : 800w, 2100h (720Leaf) - Mark 101A : id 472376 </td> <td> Workset1 : Windows : WIN-ConceptWindowFixed_gm : 1200 H x 1200 W - Mark 101C : id 472720 </td> </tr>
<tr> <td> 9 </td> <td> Workset1 : Windows : WIN-ConceptWindowFixed_gm : 1800 H x 1200 W 2 - Mark 101B : id 472688 </td> <td> Workset1 : Furniture : FUR_Sofa_gm : 2100mm : id 482306 </td> </tr>
</table>
<p><b> End of Interference Report </b>
</html>
from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(file) tag = soup.findAll('tr')
for i in tag:
txt.append(i.findAll('td'))
Now i want to convert each sublist element to text so i tried: txt1 = [i.text for x in txt for i in x] My output for txt1 however comes out as a flat list instead of list of lists. What am i doing wrong?
Upvotes: 0
Views: 695
Reputation: 180502
Put i.text
in a list:
txt1 = [[i.text] for x in txt for i in x]
You are flattening the list with your list comprehension extracting all the elements into one list.
l = [[1,2],[2,3],[5,6]]
flatten_l = [x for y in l for x in y]
print (flatten_l)
[1, 2, 2, 3, 5, 6]
Maybe you need map:
l=[[1,2,4],[2,3,5],[5,6,7]]
print [map(str, s) for s in l]
[['1', '2', '4'], ['2', '3', '5'], ['5', '6', '7']]
Using your code this calls i.text on each element maintaining the structure.
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(file)
tag = soup.findAll('tr')
txt=[(i.findAll('td')) for i in tag]
final=[[] for x in range(len(txt))]
for j,k in enumerate(txt):
for i in k:
final[j].append(i.text)
print final
[[u'', u'A', u'B'], [u'1', u'Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469021', u'Workset1 : Furniture : FUR_BoardroomTable10Chairs_gm : Board Room Layout......
Upvotes: 1