Reputation: 71
the following code:
import numpy as np
import pandas as pd
data=[['A', 1,2 ,5, 'blue'],
['A', 5,5,6, 'blue'],
['A', 4,6,7, 'blue']
,['B', 6,5,4,'yellow'],
['B',9,9,3, 'blue'],
['B', 7,9,1,'yellow']
,['B', 2,3,1,'yellow'],
['B', 5,1,2,'yellow'],
['C',2,10,9,'green']
,['C', 8,2,8,'green'],
['C', 5,4,3,'green'],
['C', 8,5 ,3,'green']]
df = pd.DataFrame(data, columns=['x','y','z','xy', 'color'])
groups = df.groupby('x')['color'].apply(list)
print(groups)
produces the following output:
x
A [blue, blue, blue]
B [yellow, blue, yellow, yellow, yellow]
C [green, green, green, green]
Name: color, dtype: object
I now want to check if there is more than one category for each 'x' value. For example, A has only one category but B has two. I am not sure if there is a way to do that.
Upvotes: 0
Views: 818
Reputation: 863226
Use DataFrameGroupBy.nunique
for unique values per groups and then filter index
values of Series
greater like 1
:
s = df.groupby('x')['color'].nunique()
x = s.index[s > 1].tolist()
Your code should be changed by add filter length of unique values:
groups = df.groupby('x')['color'].apply(list)
out = groups[groups.apply(lambda x: len(set(x))) > 1]
EDIT: For see matched values is possible use set
s and filter length:
groups = df.groupby('x')['color'].apply(set)
print (groups)
x
A {blue}
B {yellow, blue}
C {green}
Name: color, dtype: object
out = groups[groups.str.len() > 1]
print (out)
x
B {yellow, blue}
Name: color, dtype: object
Or very similar first convert to sets and then to lists:
groups = df.groupby('x')['color'].apply(lambda x: list(set(x)))
print (groups)
x
A [blue]
B [yellow, blue]
C [green]
Name: color, dtype: object
out = groups[groups.str.len() > 1]
print (out)
x
B [yellow, blue]
Name: color, dtype: object
Upvotes: 3