Reputation: 451
I'd like to filter the lists that returns the split method. split().str[i]
wouldn't work because there's more than one occurrence by list and its index position varies.
What would be a pythonic way to do it? Is it possible avoiding iteration?
Also, I don't understand why the prefix str
is used when accessing the list index: split().str[i]
In this case I'd like to get only the lists items that contain the substring 'Kg':
column_kg = df[column].str.split()
0 ['aaa', 'bbb','ccc' ,'3Kg']
1 ['aaa', 'bbb','ccc', '2.5Kg']
2 ['aaa', 'bbb','ccc', '1Kg', '34kg']
3 ['aaa', 'bbb', '45Kg', 'dddd']
4 ['aaa', 'bbb', 'ccc', '0.5Kg']
Upvotes: 1
Views: 331
Reputation: 862681
I think here is not possible avoid iterations, because all str
method under the hood use loops.
If want filter values with Kg
in lists is possible use list comprehension:
df1 = df[column].str.split().apply(lambda x: [y for y in x if 'Kg' in y])
print (df1)
0 [3Kg]
1 [2.5Kg]
2 [1Kg]
3 [45Kg]
4 [0.5Kg]
Name: a, dtype: object
Or:
df1 = df[column].apply(lambda x: [y for y in x.split() if 'Kg' in y])
Also, I don't understand why the prefix str is used when accessing the list index:
split().str[i]
.
If check Splitting and replacing strings there is used this method for select second value of string:
In [33]: s2.str.split('_').str[1]
Out[33]:
0 b
1 d
2 <NA>
3 g
dtype: object
but because strings and also lists, (tuples) are iterables, for select second value of list (tuple) is possible use same method.
Upvotes: 2