Neil
Neil

Reputation: 8247

how to group by and count in pandas

I have following dataframe in pandas.

 ID            Names               Class
 ABC       [James,Jack,Bob]          A
 DES       [Michel,Sara]             B
 ERT       [Jack,Mike]               A

I want to count names in each class

Desired output

 Class      Names        Count
  A         James          1
            Jack           2 
            Bob            1
            Mike           1

  B         Michel         1  
            Sara           1

I am using following code,but it does not seem to work.

rows = []
_ = df.apply(lambda row: [rows.append([row['Class'], nn]) 
                     for nn in row.Names], axis=1)
df_new = pd.DataFrame(rows, columns=df.columns).set_index(['Class'])

Upvotes: 1

Views: 1001

Answers (3)

samarth
samarth

Reputation: 4024

This should do the trick

df.groupby(['col1', 'col2', .., 'coln']).size()

Upvotes: 0

jezrael
jezrael

Reputation: 862491

I think you can use numpy.repeat for repeat values by legths by str.len and flat values of nested lists by chain. Last groupby by both columns and get size:

from  itertools import chain

df1 = pd.DataFrame({
        "Class": np.repeat(df.Class.values, df.Names.str.len()),
        "Names": list(chain.from_iterable(df.Names))})

print (df1)
  Class   Names
0     A   James
1     A    Jack
2     A     Bob
3     B  Michel
4     B    Sara
5     A    Jack
6     A    Mike

print (df1.groupby(['Class','Names']).size())
Class  Names 
A      Bob       1
       Jack      2
       James     1
       Mike      1
B      Michel    1
       Sara      1
dtype: int64

I think you need sort_values:

df2 = df1.groupby(['Class','Names']).size().reset_index(name='Count')
print (df2)
  Class   Names  Count
0     A     Bob      1
1     A    Jack      2
2     A   James      1
3     A    Mike      1
4     B  Michel      1
5     B    Sara      1

print (df2.sort_values(['Class','Names'], ascending=[True, False]))
  Class   Names  Count
3     A    Mike      1
2     A   James      1
1     A    Jack      2
0     A     Bob      1
5     B    Sara      1
4     B  Michel      1
print (df2.sort_values(['Class','Count'], ascending=[True, False]))

  Class   Names  Count
1     A    Jack      2
0     A     Bob      1
2     A   James      1
3     A    Mike      1
4     B  Michel      1
5     B    Sara      1

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

Solution:

In [69]: lst_col = 'Names'
    ...: pd.DataFrame({
    ...:     col:np.repeat(df[col].values, df[lst_col].str.len())
    ...:     for col in df.columns.difference([lst_col])
    ...: }).assign(**{lst_col:np.concatenate(df[lst_col].values)}) \
    ...:   .groupby(['Class','Names']).size()
    ...:
Out[69]:
Class  Names
A      Bob       1
       Jack      2
       James     1
       Mike      1
B      Michel    1
       Sara      1
dtype: int64

Explanation:

In [70]: pd.DataFrame({
    ...:     col:np.repeat(df[col].values, df[lst_col].str.len())
    ...:     for col in df.columns.difference([lst_col])
    ...: })
Out[70]:
  Class   ID
0     A  ABC
1     A  ABC
2     A  ABC
3     B  DES
4     B  DES
5     A  ERT
6     A  ERT

In [71]: pd.DataFrame({
    ...:     col:np.repeat(df[col].values, df[lst_col].str.len())
    ...:     for col in df.columns.difference([lst_col])
    ...: }).assign(**{lst_col:np.concatenate(df[lst_col].values)})
Out[71]:
  Class   ID   Names
0     A  ABC   James
1     A  ABC    Jack
2     A  ABC     Bob
3     B  DES  Michel
4     B  DES    Sara
5     A  ERT    Jack
6     A  ERT    Mike

Upvotes: 1

Related Questions