Reputation: 385
I used BeautifulSoup to parse a website and store the content. It is in this form:
records = [[[<p>data_1_1</p>], [<p>data_1_2</p>],[], [<li>data_1_3</li>]],
[[<p>data_2_1</p>], [<p>data_2_2</p>], [], [<li>data_2_3</li>]]]
I am having trouble making this:
records = [["data_1_1", "data_1_2", "data_1_3"],
["data_2_1", "data_2_2", "data_2_3"]]
I tried list comprehensions:
text_records = [sum(record, []) for record in records]
but the text is still wrapped in <p>
or <li>
tags.
text_records = [item.string for item in sum(record, []) for record in records]
takes the text out of tags, but this gives one large list, with the same values repeated multiple times.
I know there is plenty out there on list comprehensions in python, and I've searched SO, but I can't find anything to help with this situation.
Upvotes: 0
Views: 94
Reputation: 6158
Edit - This will work even for multiple items:
[sum([v.string for v in [item for item in record if item]], []) for record in records]
Adding the sum will make sure all the lists are combined into a single one per record.
Original:
This should work fine as long as you will always only have internal lists of a single item:
[[item[0].string for item in row if item] for row in records]
This will go through each record, make sure that the record exists with the if statement, and then append the first element of the list to the new record in it's string format.
Upvotes: 1
Reputation: 784
This will do the job just fine(although this many for loops is annoying,any suggestion is welcome).
records1 = [BeautifulSoup(k).text for i in records for j in i for k in j]
Upvotes: 0