Reputation: 34934
I have a DataFrame where one of the columns contains an string which contains words delimited by comma.
>>> df['column1']
# ....
996 str1, str2, str3
997 str4, str5, str7
998 str8, str9, str10
# ...........
I need to treat the content of that column as an array of string so I can do this:
[
# .....
& (df['column1'].isin('str2')) # should return the row #996
# ....
]
I tried this but it hasn't panned out, of course:
[
# .....
& (df['column1'].split(',').isin('str2'))
# ....
]
How can I do that? Or rather how can I use a method (lambda) to modify the content of the column before filtering?
UPDATE1:
This is a part of my code:
for x in pd.read_csv.....
df_item = x
if filter1:
df_item = df_item[(df_item['column1'] == filter1)]
if filter2:
df_item = df_item[(df_item['column2'].isin(subjects))]
# .....
How can I apply df['column2'].apply(lambda x: 'str2' in x.split(','))
to
if filter2:
df_item = df_item[(df_item['column2'].isin(subjects))]
Upvotes: 2
Views: 11556
Reputation: 90989
isin
checks whether the value from the series is in the iterable (in your case 'str2'
) . Not whether str2
is contained in your series' value.
If your series contains strings, then a method to get what you want would be to use .str.contains()
to check whether the string contains str2
. Example -
df['column1'].str.contains('str2')
If you must split the contents use ','
(that is if str2
can be a substring of any of the other strings) . You can use Series.apply
. Example -
df['column1'].apply(lambda x: 'str2' in x.split(','))
To apply this, simply use this to filter the DataFrame. Example -
if <somefilter>:
df_item = df_item[df_item['column2'].apply(lambda x: 'str2' in x.split(','))]
Upvotes: 8