konrad
konrad

Reputation: 3716

perform BeautifulSoup operation on list of lists while maintaining structure in python

I have a list of beautiful soup object that i am trying to further parse for contents of cells. My output becomes a list of lists with 3 items on each since the table had 3 columns.

file = <html><p><center><h1>  Interference Report  </h1></center><p>
<b>  Interference Report Project File:  </b>C:\Users\ksobon\Documents\test_project_03_ksobon.rvt  <br>  <b>  Created:  </b>  Monday, May 26, 2014 7:52:32 PM  <br>  <b>  Last Update:  </b>    <br>
 <p><table border=on>  <tr>  <td></td>  <td ALIGN="center">A</td>  <td  ALIGN="center">B</td>  </tr>
<tr>  <td>  1  </td>  <td>  Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469021     </td>  <td>  Workset1 : Furniture : FUR_BoardroomTable10Chairs_gm : Board Room Layout : id   482259  </td>  </tr>
<tr>  <td>  2  </td>  <td>  Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469021    </td>  <td>  Workset1 : Walls : Basic Wall : E103-CON 100mm : id 483442  </td>  </tr>
<tr>  <td>  3  </td>  <td>  Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469060    </td>  <td>  Workset1 : Furniture : FUR_Sofa_gm : 2100mm : id 475041  </td>  </tr>
<tr>  <td>  4  </td>  <td>  Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469109   </td>  <td>  Workset1 : Furniture : FUR_Sofa_gm : 2100mm : id 475273  </td>  </tr>
<tr>  <td>  5  </td>  <td>  Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469178   </td>  <td>  Workset1 : Furniture : FUR_Sofa_gm : 2100mm : id 475510  </td>  </tr>
<tr>  <td>  6  </td>  <td>  Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469178    </td>  <td>  Workset1 : Furniture : FUR_Sofa_gm : 2100mm : id 482306  </td>  </tr>
<tr>  <td>  7  </td>  <td>  whatever : Doors : DOR_Single_gm : 800w, 2100h (720Leaf) -  Mark 102B : id 472052  </td>  <td>  Workset1 : Windows : WIN-ConceptWindowFixed_gm : 1200 H   x 1200 W - Mark 102B : id 472822  </td>  </tr>
<tr>  <td>  8  </td>  <td>  whatever : Doors : DOR_Single_gm : 800w, 2100h (720Leaf) -  Mark 101A : id 472376  </td>  <td>  Workset1 : Windows : WIN-ConceptWindowFixed_gm : 1200 H   x 1200 W - Mark 101C : id 472720  </td>  </tr>
<tr>  <td>  9  </td>  <td>  Workset1 : Windows : WIN-ConceptWindowFixed_gm : 1800 H x  1200 W 2 - Mark 101B : id 472688  </td>  <td>  Workset1 : Furniture : FUR_Sofa_gm : 2100mm   : id 482306  </td>  </tr>
</table>
<p><b>  End of Interference Report  </b>
</html>

from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(file) tag = soup.findAll('tr')

for i in tag:
    txt.append(i.findAll('td'))

Now i want to convert each sublist element to text so i tried: txt1 = [i.text for x in txt for i in x] My output for txt1 however comes out as a flat list instead of list of lists. What am i doing wrong?

Upvotes: 0

Views: 695

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180502

Put i.text in a list:

txt1 = [[i.text] for x in txt for i in x] 

You are flattening the list with your list comprehension extracting all the elements into one list.

l = [[1,2],[2,3],[5,6]]

flatten_l = [x for y in l for x in y]
print (flatten_l)
[1, 2, 2, 3, 5, 6]

Maybe you need map:

l=[[1,2,4],[2,3,5],[5,6,7]]

print [map(str, s) for s in l]

[['1', '2', '4'], ['2', '3', '5'], ['5', '6', '7']]

Using your code this calls i.text on each element maintaining the structure.

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(file)

tag = soup.findAll('tr')
txt=[(i.findAll('td')) for i in tag]
final=[[] for x in range(len(txt))]
for j,k in enumerate(txt):
    for i in k:
        final[j].append(i.text)  

 print final
 [[u'', u'A', u'B'], [u'1', u'Workset1 : Walls : Basic Wall : E103-CON 100mm : id 469021', u'Workset1 : Furniture : FUR_BoardroomTable10Chairs_gm : Board Room Layout......

Upvotes: 1

Related Questions