Reputation: 369
I have a certain condition (Incident = yes
) and I want to know the values in two columns fulfilling this condition. I have a very big data frame (many rows and many columns) and I am looking for a "screening" function.
To illustrate the following example with the df
(which has many more columns than shown):
Repetition Step Incident Test1 Test2
1 1 no 10 20
1 1 no 9 11
1 2 yes 9 19
1 2 yes 11 20
1 2 yes 12 22
1 3 yes 9 18
1 3 yes 8 18
What I would like to get as an answer is
Repetition Step
1 2
1 3
If I only wanted to know the Step, I would use the following command:
df[df.Incident == 'yes'].Step.unique()
Is there a similar command to get the values of two columns for a specific condition?
Thanks for the help! :-)
Upvotes: 0
Views: 431
Reputation: 28659
You could use the query option for the condition, select the interested columns, and finally remove duplicate values
df.query('Incident=="yes"').filter(['Repetition','Step']).drop_duplicates()
OR
you could use the Pandas' loc method, set the rows as the condition, set the columns part with the columns you are interested in, then drop the duplicates.
df.loc[df.Incident=="yes",['Repetition','Step']].drop_duplicates()
Upvotes: 1