Reputation: 21
lets say that there was a table with the following contents:
<td>Dog,Cat,Mouse</td>
<td>Snake,Dragon,Dinosaur,Lizard</td>
<td>Owl,Falcon,Phoenix</td>
and i want to make it like this on python:
>>>pets
[['Dog'],['Cat'],['Mouse'],['Snake'],['Dragon'],['Dinosaur'],['Lizard'],['Owl'],['Falcon'],['Phoenix']]
This is what i have managed so far.
animal = table.find_all('td')
pets = []
for i in animal:
a = re.findall('[A-Z][a-z]*',str(i))
pets.append(a)
however, i cant figure out a way to turn
['Dog','Cat','Mouse']
to
['Dog'],['Cat'],['Mouse'],
and so on. please help. This is my first few days of programming and im already stuck. Thanks in advance.
Upvotes: 0
Views: 129
Reputation: 1538
First, you should know that regex
(regular expressions) are not always the best solution to parse some data. Here for instance, all your elements are separated by a ,
so the split
method is the way to go.
As for putting your elements as arrays with a single element, list comprehension is the easiest way to do it. Again: make sure you really want/need to do this. It doesn't make much sense to have a set of lists with a single element.
Here's a suggested implementation:
elements = table.find_all('td')
pets = []
for e in elements:
# The following line is only needed if 'find_all' keeps the <td> and </td>
e_tagless = e[5:len(e)-5]
animals = e_tagless.split(',')
pets += [ [animal] for animal in animals ]
Upvotes: 2
Reputation: 250941
import re
strs = """<td>Dog,Cat,Mouse</td>
<td>Snake,Dragon,Dinosaur,Lizard</td>
<td>Owl,Falcon,Phoenix</td>"""
r = re.compile(r'<td>(.*?)</td>')
print [[x] for m in r.finditer(strs) for x in m.group(1).split(',')]
This prints:
[['Dog'], ['Cat'], ['Mouse'], ['Snake'], ['Dragon'], ['Dinosaur'], ['Lizard'], ['Owl'], ['Falcon'], ['Phoenix']]
And supports multiple <td>..</td>
on the same line.
Upvotes: 2
Reputation: 1695
Change this:
animal = table.find_all('td')
pets = []
for i in animal:
a = re.findall('[A-Z][a-z]*',str(i))
pets.append(a)
To this:
animal = table.find_all('td')
pets = []
for i in animal:
a = re.findall('[A-Z][a-z]*',str(i))
pets.append([a])
You were just missing the two characters []
when you were appending to mark up each item into it's own list during the loop iteration.
Upvotes: 0
Reputation: 26572
Try this:
>>> my_list = ['Dog','Cat','Mouse']
>>> map(lambda x: [x], my_list)
[['Dog'], ['Cat'], ['Mouse']]
Upvotes: 0