Reputation: 898
I have a DataFrame df_things
that looks like this and i want to predict the quality of the classification before the training
A B C CLASS
-----------------------
al1 bal1 cal1 Ship
al1 bal1 cal1 Ship
al1 bal2 cal2 Ship
al2 bal2 cal2 Cow
al3 bal3 cal3 Car
al1 bal2 cal3 Car
al3 bal3 cal3 Car
I want to group rows by classes so that i have an idea of the distribution of the features. I do this with (for example, on col "B"),
df_B = df_things.groupby('CLASS').B.value_counts()
which gives me the results
CLASS B
-------------
ship bal1 2
bal2 1
cow bal2 2
car bal2 1
bal3 2
What I want to to is to visualize only groups that have more than one value so that it looks like this:
CLASS B
-------------
ship bal1 2
bal2 1
car bal2 1
bal3 2
I'm a little bit stuck, so any ideas?
Upvotes: 5
Views: 2601
Reputation: 1964
Here is another approach.
Setup the input data:
In [1]:
import pandas as pd
df_things = pd.DataFrame({
'A': ['al1', 'al1', 'al1', 'al2', 'al3', 'al1', 'al3'],
'B': ['bal1', 'bal1', 'bal2', 'bal2', 'bal3', 'bal2', 'bal3'],
'C': ['cal1', 'cal1', 'cal2', 'cal2', 'cal3', 'cal3', 'cal3'],
'CLASS': ['Ship', 'Ship', 'Ship', 'Cow', 'Car', 'Car', 'Car']
})
print(df_things)
A B C CLASS
0 al1 bal1 cal1 Ship
1 al1 bal1 cal1 Ship
2 al1 bal2 cal2 Ship
3 al2 bal2 cal2 Cow
4 al3 bal3 cal3 Car
5 al1 bal2 cal3 Car
6 al3 bal3 cal3 Car
Reduce it to groups that have more than one unique value
In [2]:
df_reduced = df_things.groupby(['CLASS']).filter(lambda grp: grp['B'].nunique() > 1)
print(df_reduced)
A B C CLASS
0 al1 bal1 cal1 Ship
1 al1 bal1 cal1 Ship
2 al1 bal2 cal2 Ship
4 al3 bal3 cal3 Car
5 al1 bal2 cal3 Car
6 al3 bal3 cal3 Car
Apply groupby to get the desired output
In [3]:
df_reduced.groupby(['CLASS'])['B'].value_counts()
Out[3]:
CLASS B
Car bal3 2
bal2 1
Ship bal1 2
bal2 1
Name: B, dtype: int64
BTW, you have a typo in df_B in your question. It should be
In [4]:
df_B = df_things.groupby('CLASS').B.value_counts()
print(df_B)
CLASS B
Car bal3 2
bal2 1
Cow bal2 1
Upvotes: 0
Reputation: 323326
Solution from crosstab
s=pd.crosstab(df.CLASS,df.B)
s[s.ne(0).sum(1)>1].replace(0,np.nan).stack()
CLASS B
Car bal2 1.0
bal3 2.0
Ship bal1 2.0
bal2 1.0
dtype: float64
Upvotes: 2
Reputation: 402852
You can use groupby
to filter groups that have an nunique
count over 1.
v = df_things.groupby('CLASS').B.value_counts()
v[v.groupby(level=0).transform('nunique').gt(1)]
CLASS B
Car bal3 2
bal2 1
Ship bal1 2
bal2 1
Name: B, dtype: int64
Upvotes: 4