meh.pandalala
meh.pandalala

Reputation: 43

How to delete row in pandas dataframe based on condition if string is found in cell value of type list?

I've been struggling with the following issue that sounds very easy in fact but can't seem to figure it out and I'm sure it's something very obvious in the stacktrace but I'm just being dumb.

I simply have a pandas dataframe looking like this:

dataframe

And want to drop the rows that contain, in the jpgs cell value (list), the value "123.jpg". So normally I would get the final dataframe with only rows of index 1 and 3.

However I've tried a lot of methods and none of them works.

For example:

df = df["123.jpg" not in df.jpgs]

or

df = df[df.jpgs.tolist().count("123.jpg") == 0]

give error KeyError: True:

err

df = df[df['jpgs'].str.contains('123.jpg') == False]

Returns an empty dataframe:

err2

df = df[df.jpgs.count("123.jpg") == 0]

And

df = df.drop(df["123.jpg" in df.jpgs].index)

Gives KeyError: False:

err

This is my entire code if needed, and I would really appreciate if someone would help me with an answer to what I'm doing wrong :( . Thanks!!

import pandas as pd

df = pd.DataFrame(columns=["person_id", "jpgs"])

id = 1
pair1 = ["123.jpg", "124.jpg"]
pair2 = ["125.jpg", "300.jpg"]
pair3 = ["500.jpg", "123.jpg"]
pair4 = ["111.jpg", "122.jpg"]
row1 = {'person_id': id, 'jpgs': pair1}
row2 = {'person_id': id, 'jpgs': pair2}
row3 = {'person_id': id, 'jpgs': pair3}
row4 = {'person_id': id, 'jpgs': pair4}

df = df.append(row1, ignore_index=True)
df = df.append(row2, ignore_index=True)
df = df.append(row3, ignore_index=True)
df = df.append(row4, ignore_index=True)
print(df)

#df = df["123.jpg" not in df.jpgs]
#df = df[df['jpgs'].str.contains('123.jpg') == False]

#df = df[df.jpgs.tolist().count("123.jpg") == 0]
df = df.drop(df["123.jpg" in df.jpgs].index)
print("\n Final df")
print(df)

Upvotes: 2

Views: 3907

Answers (3)

mcsoini
mcsoini

Reputation: 6642

Since you filter on a list column, apply lambda would probably be the easiest:

df.loc[df.jpgs.apply(lambda x: "123.jpg" not in x)]

Quick comments on your attempts:

  • In df = df.drop(df["123.jpg" in df.jpgs].index) you are checking whether the exact value "123.jpg" is contained in the column ("123.jpg" in df.jpgs) rather than in any of the lists, which is not what you want.

  • In df = df[df['jpgs'].str.contains('123.jpg') == False] goes in the right direction, but you are missing the regex=False keyword, as shown in Ibrahim's answer.

  • df[df.jpgs.count("123.jpg") == 0] is also not applicable here, since count returns the total number of non-NaN values in the Series.

Upvotes: 4

Omar Ashraf
Omar Ashraf

Reputation: 154

You can try this:

mask = df.jpgs.apply(lambda x: '123.jpg' not in x)
df = df[mask]

Upvotes: 1

Ibrahim
Ibrahim

Reputation: 844

For str.contains one this is how it is done

df[df.jpgs.str.contains("123.jpg", regex=False)]

Upvotes: 3

Related Questions