Reputation: 2625
What I need to do is quite simple but I can't figure out how to.
I have a lot of strings organized in a list:
list = ['my name is Marco and i'm 24 years old', 'my name is Jhon and i'm 30 years old']
I use a regex to extract information from each element of the list:
for element in list:
name = re.findall('my name is (.*?) and i\'m', element, re.DOTALL)[0]
age = re.findall('and i\'m (.*?) years old', element, re.DOTALL)[0]
Now what I want to do is to re-compile a new list that has as elements sub-lists composed by name and age.
Example:
for element in newlist:
name = element[0]
age = element[1]
Is it possible to do something like this?
Upvotes: 1
Views: 312
Reputation: 87084
First of all you don't need two regex expressions to pluck out the two values for name and age.
>>> s = "my name is Marco and i'm 24 years old"
>>> pattern = r"my name is\s+(.+)\s+and i'm\s+(\d+)\s+years old"
>>> m = re.match(pattern, s)
>>> print(m.groups())
('Marco', '24')
And you can use a list comprehension to construct the new list:
>>> data = ["my name is Marco and i'm 24 years old", "my name is Jhon and i'm 30 years old"]
>>> new_list = [re.match(pattern, s).groups() for s in data]
>>> print(new_list)
[('Marco', '24'), ('Jhon', '30')]
The result is a list of tuples. If you really need a list of lists you can do this:
new_list = [list(re.match(pattern, s).groups()) for s in data]
The list comprehension is short hand for this loop:
new_list = []
for s in data:
m = re.match(pattern, s)
if m:
new_list.append(m.groups())
The main difference between this loop and the list comprehension is that the former can handle strings that do not match the pattern, whereas the list comprehension assumes that the pattern will always match (an exception will result if it doesn't match). You can handle this in the list comprehension, however, it starts to get ugly as you will need to perform the regex match twice: once to check whether the pattern matched, and then again to extract the actual values. In this case I think that the explicit for loop is cleaner.
Upvotes: 1
Reputation: 1402
Here is the solution that will do exactly as you want. This will create a new list consisting of sub lists with having name and age.
new_list = []
for element in list:
name = re.findall('my name is (.*?) and i\'m', element, re.DOTALL)[0]
age = re.findall('and i\'m (.*?) years old', element, re.DOTALL)[0]
new_list.append([name, age])
Upvotes: 1
Reputation: 2826
You can do what you want using a simple list comprehension:
name_pat = re.compile('my name is (.*?) and i\'m', re.DOTALL)
age_pat = re.compile('and i\'m (.*?) years old', re.DOTALL)
new_list = [[name_pat.findall(elem)[0], age_pat.findall(elem)[0]] for elem in your_list]
Upvotes: 1