Reputation:
I have a list:
my_list = ['A70-11370; reprint; rolled; 2000; 26.5 x 38.5',
'A70-713; reprint; rolled; 1980; 26.5 x 38.5',
'b70-7814; reprint; Style A; rolled; 1939; 22.5 x 34.5',
'A70-7600; reprint; rolled; 1986; 26.5 x 38.5',
'A70-6912; reprint; style C; rolled; 1977; 26.5 x 38.5',
'A70-8692; reprint; regular; rolled; 1995; 26.5 x 38.5',
'A70-2978; reprint; rolled; 1991; 26.5 x 38.5',
'A70-4902; reprint; Style A; rolled; 1999; 26.5 x 38.5',
'A70-6300; reprint; regular; rolled; 1983; 26.5 x 38.5',
'MPW-6725; reprint; rolled; 1966; 26.5 x 38']
I want to extract the strings that contains 'x' (eg. 26.5 x 38.5). I have tried:
string = [i if 'x' in i else np.nan for i in str(my_string).split(';')]
Placing nan where the condition isn't met but I'm only part way there. Is there a way to get the strings I want with and without the nan placeholder?
Upvotes: 1
Views: 58
Reputation: 27515
You’ll need a nested list comprehension to get each substring in the list.
[x for s in my_list for x in s.split('; ') if 'x' in x]
Results:
['26.5 x 38.5', '26.5 x 38.5', '22.5 x 34.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38']
Using re
would be more appropriate for this though as just using if 'x' in x
may return unwanted results:
p = re.compile("\d+\.\d+ x \d+\.\d+")
[m.group(0) for m in map(p.search, my_list) if m]
Upvotes: 3
Reputation: 2209
Here's a regex-based solution. It's more robust than the other solutions offered because it'll work even if the desired string isn't preceded by a ;
.
import re
reg = re.compile(r'\b(\d+\.\d+\b x \b\d+\.\d+)\b')
new_list = []
for elem in my_list:
result = re.search(reg, elem)
if result:
new_list.append(result.group(0))
Upvotes: 0
Reputation: 37771
outputs = [subitem for item in my_list for subitem in item.split(';') if 'x' in subitem]
print(outputs)
Outputs:
[' 26.5 x 38.5', ' 26.5 x 38.5', ' 22.5 x 34.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38']
Upvotes: 0
Reputation: 2569
Like this
string = [i for my_string in my_list for i in str(my_string).split(';') if 'x' in i ]
Upvotes: 1
Reputation: 631
yes if you want to extract only string that contains 'x' then you can do
sep = ''.join(my_list).split(';')
with_x = filter(lambda str_: 'x' in str_, sep)
for i in with_x:
print(i)
Upvotes: 0
Reputation: 3534
Using a list comprehension for this may get ugly and I'd recommend using two for loops separately for readability.
my_list = ['A70-11370; reprint; rolled; 2000; 26.5 x 38.5',
'A70-713; reprint; rolled; 1980; 26.5 x 38.5',
'b70-7814; reprint; Style A; rolled; 1939; 22.5 x 34.5',
'A70-7600; reprint; rolled; 1986; 26.5 x 38.5',
'A70-6912; reprint; style C; rolled; 1977; 26.5 x 38.5',
'A70-8692; reprint; regular; rolled; 1995; 26.5 x 38.5',
'A70-2978; reprint; rolled; 1991; 26.5 x 38.5',
'A70-4902; reprint; Style A; rolled; 1999; 26.5 x 38.5',
'A70-6300; reprint; regular; rolled; 1983; 26.5 x 38.5',
'MPW-6725; reprint; rolled; 1966; 26.5 x 38']
multiplications = []
for item in my_list:
for subitem in item.split(';'):
if 'x' in subitem:
multiplications.append(subitem.strip())
print('\n'.join(multiplications))
This outputs:
26.5 x 38.5
26.5 x 38.5
22.5 x 34.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38
Upvotes: 1