Reputation:

String extraction from a list

I have a list:

my_list = ['A70-11370; reprint; rolled; 2000; 26.5 x 38.5',
 'A70-713; reprint; rolled; 1980; 26.5 x 38.5',
 'b70-7814; reprint; Style A; rolled; 1939; 22.5 x 34.5',
 'A70-7600; reprint; rolled; 1986; 26.5 x 38.5',
 'A70-6912; reprint; style C; rolled; 1977; 26.5 x 38.5',
 'A70-8692; reprint; regular; rolled; 1995; 26.5 x 38.5',
 'A70-2978; reprint; rolled; 1991; 26.5 x 38.5',
 'A70-4902; reprint; Style A; rolled; 1999; 26.5 x 38.5',
 'A70-6300; reprint; regular; rolled; 1983; 26.5 x 38.5',
 'MPW-6725; reprint; rolled; 1966; 26.5 x 38']

I want to extract the strings that contains 'x' (eg. 26.5 x 38.5). I have tried:

string = [i if 'x' in i else np.nan for i in str(my_string).split(';')]

Placing nan where the condition isn't met but I'm only part way there. Is there a way to get the strings I want with and without the nan placeholder?

Upvotes: 1

Answers (6)

Jab

Reputation: 27515

You’ll need a nested list comprehension to get each substring in the list.

[x for s in my_list for x in s.split('; ') if 'x' in x]

Results:

['26.5 x 38.5', '26.5 x 38.5', '22.5 x 34.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38.5', '26.5 x 38']

Using re would be more appropriate for this though as just using if 'x' in x may return unwanted results:

p = re.compile("\d+\.\d+ x \d+\.\d+")
[m.group(0) for m in map(p.search, my_list) if m]

Upvotes: 3

ba_ul

Reputation: 2209

Here's a regex-based solution. It's more robust than the other solutions offered because it'll work even if the desired string isn't preceded by a ;.

import re

reg = re.compile(r'\b(\d+\.\d+\b x \b\d+\.\d+)\b')

new_list = []

for elem in my_list:
  result = re.search(reg, elem)
  if result:
    new_list.append(result.group(0))

Upvotes: 0

Wasi Ahmad

Reputation: 37771

outputs = [subitem for item in my_list for subitem in item.split(';') if 'x' in subitem]
print(outputs)

Outputs:

[' 26.5 x 38.5', ' 26.5 x 38.5', ' 22.5 x 34.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38.5', ' 26.5 x 38']

Upvotes: 0

Florian Bernard

Reputation: 2569

Like this

string = [i for my_string in my_list for i in str(my_string).split(';') if 'x' in i ]

Upvotes: 1

juancarlos

Reputation: 631

yes if you want to extract only string that contains 'x' then you can do

sep = ''.join(my_list).split(';')

with_x = filter(lambda str_: 'x' in str_, sep)

for i in with_x:
    print(i)

Upvotes: 0

kingkupps

Reputation: 3534

Using a list comprehension for this may get ugly and I'd recommend using two for loops separately for readability.

my_list = ['A70-11370; reprint; rolled; 2000; 26.5 x 38.5',
 'A70-713; reprint; rolled; 1980; 26.5 x 38.5',
 'b70-7814; reprint; Style A; rolled; 1939; 22.5 x 34.5',
 'A70-7600; reprint; rolled; 1986; 26.5 x 38.5',
 'A70-6912; reprint; style C; rolled; 1977; 26.5 x 38.5',
 'A70-8692; reprint; regular; rolled; 1995; 26.5 x 38.5',
 'A70-2978; reprint; rolled; 1991; 26.5 x 38.5',
 'A70-4902; reprint; Style A; rolled; 1999; 26.5 x 38.5',
 'A70-6300; reprint; regular; rolled; 1983; 26.5 x 38.5',
 'MPW-6725; reprint; rolled; 1966; 26.5 x 38']


multiplications = []
for item in my_list:
    for subitem in item.split(';'):
        if 'x' in subitem:
            multiplications.append(subitem.strip())

print('\n'.join(multiplications))

This outputs:

26.5 x 38.5
26.5 x 38.5
22.5 x 34.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38.5
26.5 x 38

Upvotes: 1

String extraction from a list

Answers (6)

Related Questions