Reputation: 25
I'm trying to evaluate some images based on the classification. I use the piece of code below to read the csv file:
import pandas as pd
file = pd.read_csv('test.csv', header=None)
So I have something that looks like this:
Image1 2 3 4 5 Green
Image1 3 4 5 6 Red
Image2 4 5 6 7 Red
Image3 1 4 8 9 Green
Image4 5 3 0 1 Yellow
Image4 6 2 1 1 Green
So in case I want to keep the images with the value "Green" the output should look like this:
Image1 2 3 4 5 Green
Image1 3 4 5 6 Red
Image3 1 4 8 9 Green
Image4 5 3 0 1 Yellow
Image4 6 2 1 1 Green
which means that I want to keep the images with the same id in the first column when there is at least one with the element I check is in the last column.
I used the isin
method but I don't know how to keep the images the rest of the rows with the images that do have at least on time the value "Green" in the last column.
Upvotes: 0
Views: 47
Reputation: 42946
We can use GroupBy.any
here, where we check if any of the rows suffice our condition:
df[df[5].eq("Green").groupby(df[0]).transform("any")]
0 1 2 3 4 5
0 Image1 2 3 4 5 Green
1 Image1 3 4 5 6 Red
3 Image3 1 4 8 9 Green
4 Image4 5 3 0 1 Yellow
5 Image4 6 2 1 1 Green
Upvotes: 1
Reputation: 51425
You can use loc
to find the values in the first column where the 6th column is Green
, and use that as your values to pass to isin
:
df[df[0].isin(df.loc[df[5] == "Green", 0])]
# if it has to be the last column, instead of the 6h column, use `iloc` instead:
# df[df[0].isin(df.loc[df.iloc[:, -1] == "Green", 0])]
Image1 2 3 4 5 Green
Image1 3 4 5 6 Red
Image3 1 4 8 9 Green
Image4 5 3 0 1 Yellow
Image4 6 2 1 1 Green
Breaking it down:
the inner loc
retrieves the Images that contain Green
in the first column:
df.loc[df[5] == "Green", 0]
0 Image1
3 Image3
5 Image4
Name: 0, dtype: object
Passing that to isin
, you get a boolean mask of where the first column matches one of those values:
df[0].isin(df.loc[df[5] == "Green", 0])
0 True
1 True
2 False
3 True
4 True
5 True
Name: 0, dtype: bool
Which you can use to filter your df
:
df[df[0].isin(df.loc[df[5] == "Green", 0])]
Upvotes: 1