Filtering in pandas - how to apply a custom method (lambda)?

Question

I have a DataFrame where one of the columns contains an string which contains words delimited by comma.

>>> df['column1']
# ....
996                  str1, str2, str3
997                  str4, str5, str7
998                  str8, str9, str10
# ...........

I need to treat the content of that column as an array of string so I can do this:

 [
  # ..... 
  & (df['column1'].isin('str2')) # should return the row #996
  # ....
 ]

I tried this but it hasn't panned out, of course:

 [
  # ..... 
  & (df['column1'].split(',').isin('str2'))
  # ....
 ]

How can I do that? Or rather how can I use a method (lambda) to modify the content of the column before filtering?

UPDATE1:

This is a part of my code:

for x in pd.read_csv.....
      df_item = x

      if filter1:
        df_item = df_item[(df_item['column1'] == filter1)]

      if filter2:
        df_item = df_item[(df_item['column2'].isin(subjects))]

      # .....

How can I apply df['column2'].apply(lambda x: 'str2' in x.split(',')) to

  if filter2:
    df_item = df_item[(df_item['column2'].isin(subjects))]

Anand S Kumar · Accepted Answer

isin checks whether the value from the series is in the iterable (in your case 'str2' ) . Not whether str2 is contained in your series' value.

If your series contains strings, then a method to get what you want would be to use .str.contains() to check whether the string contains str2 . Example -

df['column1'].str.contains('str2')

If you must split the contents use ',' (that is if str2 can be a substring of any of the other strings) . You can use Series.apply . Example -

df['column1'].apply(lambda x: 'str2' in x.split(','))

To apply this, simply use this to filter the DataFrame. Example -

if :
    df_item = df_item[df_item['column2'].apply(lambda x: 'str2' in x.split(','))]

Filtering in pandas - how to apply a custom method (lambda)?

Answers (1)

Related Questions