Aleix
Aleix

Reputation: 451

How to filter list items in Series returned by Series.split()

I'd like to filter the lists that returns the split method. split().str[i] wouldn't work because there's more than one occurrence by list and its index position varies.

What would be a pythonic way to do it? Is it possible avoiding iteration?

Also, I don't understand why the prefix str is used when accessing the list index: split().str[i]

In this case I'd like to get only the lists items that contain the substring 'Kg':

   column_kg = df[column].str.split()


0                         ['aaa', 'bbb','ccc' ,'3Kg']
1                         ['aaa', 'bbb','ccc', '2.5Kg']
2                         ['aaa', 'bbb','ccc', '1Kg', '34kg']
3                         ['aaa', 'bbb', '45Kg', 'dddd']
4                         ['aaa', 'bbb', 'ccc', '0.5Kg']

Upvotes: 1

Views: 331

Answers (1)

jezrael
jezrael

Reputation: 862681

I think here is not possible avoid iterations, because all str method under the hood use loops.

If want filter values with Kg in lists is possible use list comprehension:

df1 = df[column].str.split().apply(lambda x: [y for y in x if 'Kg' in y])
print (df1)
0      [3Kg]
1    [2.5Kg]
2      [1Kg]
3     [45Kg]
4    [0.5Kg]
Name: a, dtype: object

Or:

df1 = df[column].apply(lambda x: [y for y in x.split() if 'Kg' in y])

Also, I don't understand why the prefix str is used when accessing the list index: split().str[i].

If check Splitting and replacing strings there is used this method for select second value of string:

In [33]: s2.str.split('_').str[1]
Out[33]: 
0       b
1       d
2    <NA>
3       g
dtype: object

but because strings and also lists, (tuples) are iterables, for select second value of list (tuple) is possible use same method.

Upvotes: 2

Related Questions