Emanon
Emanon

Reputation: 25

Selecting multiple rows based on different column values

I'm trying to evaluate some images based on the classification. I use the piece of code below to read the csv file:

import pandas as pd
file = pd.read_csv('test.csv', header=None)

So I have something that looks like this:

Image1  2  3  4  5  Green
Image1  3  4  5  6  Red
Image2  4  5  6  7  Red
Image3  1  4  8  9  Green
Image4  5  3  0  1  Yellow
Image4  6  2  1  1  Green

So in case I want to keep the images with the value "Green" the output should look like this:

Image1  2  3  4  5  Green
Image1  3  4  5  6  Red
Image3  1  4  8  9  Green
Image4  5  3  0  1  Yellow
Image4  6  2  1  1  Green

which means that I want to keep the images with the same id in the first column when there is at least one with the element I check is in the last column.

I used the isin method but I don't know how to keep the images the rest of the rows with the images that do have at least on time the value "Green" in the last column.

Upvotes: 0

Views: 47

Answers (2)

Erfan
Erfan

Reputation: 42946

We can use GroupBy.any here, where we check if any of the rows suffice our condition:

df[df[5].eq("Green").groupby(df[0]).transform("any")]

        0  1  2  3  4       5
0  Image1  2  3  4  5   Green
1  Image1  3  4  5  6     Red
3  Image3  1  4  8  9   Green
4  Image4  5  3  0  1  Yellow
5  Image4  6  2  1  1   Green

Upvotes: 1

sacuL
sacuL

Reputation: 51425

You can use loc to find the values in the first column where the 6th column is Green, and use that as your values to pass to isin:

df[df[0].isin(df.loc[df[5] == "Green", 0])]
# if it has to be the last column, instead of the 6h column, use `iloc` instead:
# df[df[0].isin(df.loc[df.iloc[:, -1] == "Green", 0])]

Image1  2  3  4  5  Green
Image1  3  4  5  6  Red
Image3  1  4  8  9  Green
Image4  5  3  0  1  Yellow
Image4  6  2  1  1  Green

Breaking it down:

the inner loc retrieves the Images that contain Green in the first column:

df.loc[df[5] == "Green", 0] 
0    Image1
3    Image3
5    Image4
Name: 0, dtype: object

Passing that to isin, you get a boolean mask of where the first column matches one of those values:

df[0].isin(df.loc[df[5] == "Green", 0])
0     True
1     True
2    False
3     True
4     True
5     True
Name: 0, dtype: bool

Which you can use to filter your df:

df[df[0].isin(df.loc[df[5] == "Green", 0])]

Upvotes: 1

Related Questions