kentoto
kentoto

Reputation: 59

Filter out rows that are the same in one column but have multiple values in another columns respectively in dataframe

To say I have a dataframe with three columns like:

index A B C
1 foo One 1
2 foo Two 2
3 foo Three 3
4 bar One 2
5 bar One 1
6 num Two 3
7 num Three 3

In this case, how may I filter out the rows that have the same value in column B but more than one respective value in column C by using Python Pandas?

The rows that I need is 1, 2, 4, 5, 6 because "One" in column B has two corresponding values (1 and 2) in column C and "Two" in column B has two corresponding values as well. Eventually I want to group them by column A if possible.

Upvotes: 1

Views: 1210

Answers (2)

Ynjxsjmh
Ynjxsjmh

Reputation: 30032

You can try groupby B column then filter by the value_counts of C column.

out = df.groupby('B').filter(lambda group: len(group['C'].value_counts()) > 1)
print(out)

   index    A    B  C
0      1  foo  One  1
1      2  foo  Two  2
3      4  bar  One  2
4      5  bar  One  1
5      6  num  Two  3

Upvotes: 1

Echo
Echo

Reputation: 600

Not an optimised solution but will get your work done:

import pandas as pd


# create dataframe
df = pd.DataFrame([['foo','One',1],['foo','Two',2],['foo','Three',3],['bar','One',2], ['bar','One',1],['num','Two',3],['num','Three',3]], index = range(1,8), columns = ['A','B','C'])

# get the unique values present in column B
values = list(df['B'].unique())

result = pd.DataFrame()
# iterate through the unique values and for each unique value check the corresponding values in C
for val in values:
    unique_values = list(df[df['B'] == val]['C'].unique())
    # if the unique values in column C is more than 1, it satisfies your condition and hence can be added into your result dataFrame.
    if len(unique_values) > 1:
        result = result.append(df[df['B'] == val])

print(result)

The result is the rows 1, 2, 4, 5, 6.

Always show your work in the question.

Upvotes: 0

Related Questions