How to filter list items in Series returned by Series.split()

Question

I'd like to filter the lists that returns the split method. split().str[i] wouldn't work because there's more than one occurrence by list and its index position varies.

What would be a pythonic way to do it? Is it possible avoiding iteration?

Also, I don't understand why the prefix str is used when accessing the list index: split().str[i]

In this case I'd like to get only the lists items that contain the substring 'Kg':

   column_kg = df[column].str.split()


0                         ['aaa', 'bbb','ccc' ,'3Kg']
1                         ['aaa', 'bbb','ccc', '2.5Kg']
2                         ['aaa', 'bbb','ccc', '1Kg', '34kg']
3                         ['aaa', 'bbb', '45Kg', 'dddd']
4                         ['aaa', 'bbb', 'ccc', '0.5Kg']

jezrael · Accepted Answer

I think here is not possible avoid iterations, because all str method under the hood use loops.

If want filter values with Kg in lists is possible use list comprehension:

df1 = df[column].str.split().apply(lambda x: [y for y in x if 'Kg' in y])
print (df1)
0      [3Kg]
1    [2.5Kg]
2      [1Kg]
3     [45Kg]
4    [0.5Kg]
Name: a, dtype: object

Or:

df1 = df[column].apply(lambda x: [y for y in x.split() if 'Kg' in y])

Also, I don't understand why the prefix str is used when accessing the list index: split().str[i].

If check Splitting and replacing strings there is used this method for select second value of string:

In [33]: s2.str.split('_').str[1]
Out[33]: 
0       b
1       d
2    
3       g
dtype: object

but because strings and also lists, (tuples) are iterables, for select second value of list (tuple) is possible use same method.

How to filter list items in Series returned by Series.split()

Answers (1)

Related Questions