Reputation: 8247
I have following dataframe in pandas.
ID Names Class
ABC [James,Jack,Bob] A
DES [Michel,Sara] B
ERT [Jack,Mike] A
I want to count names in each class
Desired output
Class Names Count
A James 1
Jack 2
Bob 1
Mike 1
B Michel 1
Sara 1
I am using following code,but it does not seem to work.
rows = []
_ = df.apply(lambda row: [rows.append([row['Class'], nn])
for nn in row.Names], axis=1)
df_new = pd.DataFrame(rows, columns=df.columns).set_index(['Class'])
Upvotes: 1
Views: 1001
Reputation: 4024
This should do the trick
df.groupby(['col1', 'col2', .., 'coln']).size()
Upvotes: 0
Reputation: 862491
I think you can use numpy.repeat
for repeat values by legths by str.len
and flat values of nested lists
by chain
. Last groupby
by both columns and get size
:
from itertools import chain
df1 = pd.DataFrame({
"Class": np.repeat(df.Class.values, df.Names.str.len()),
"Names": list(chain.from_iterable(df.Names))})
print (df1)
Class Names
0 A James
1 A Jack
2 A Bob
3 B Michel
4 B Sara
5 A Jack
6 A Mike
print (df1.groupby(['Class','Names']).size())
Class Names
A Bob 1
Jack 2
James 1
Mike 1
B Michel 1
Sara 1
dtype: int64
I think you need sort_values
:
df2 = df1.groupby(['Class','Names']).size().reset_index(name='Count')
print (df2)
Class Names Count
0 A Bob 1
1 A Jack 2
2 A James 1
3 A Mike 1
4 B Michel 1
5 B Sara 1
print (df2.sort_values(['Class','Names'], ascending=[True, False]))
Class Names Count
3 A Mike 1
2 A James 1
1 A Jack 2
0 A Bob 1
5 B Sara 1
4 B Michel 1
print (df2.sort_values(['Class','Count'], ascending=[True, False]))
Class Names Count
1 A Jack 2
0 A Bob 1
2 A James 1
3 A Mike 1
4 B Michel 1
5 B Sara 1
Upvotes: 1
Reputation: 210832
Solution:
In [69]: lst_col = 'Names'
...: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.difference([lst_col])
...: }).assign(**{lst_col:np.concatenate(df[lst_col].values)}) \
...: .groupby(['Class','Names']).size()
...:
Out[69]:
Class Names
A Bob 1
Jack 2
James 1
Mike 1
B Michel 1
Sara 1
dtype: int64
Explanation:
In [70]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.difference([lst_col])
...: })
Out[70]:
Class ID
0 A ABC
1 A ABC
2 A ABC
3 B DES
4 B DES
5 A ERT
6 A ERT
In [71]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.difference([lst_col])
...: }).assign(**{lst_col:np.concatenate(df[lst_col].values)})
Out[71]:
Class ID Names
0 A ABC James
1 A ABC Jack
2 A ABC Bob
3 B DES Michel
4 B DES Sara
5 A ERT Jack
6 A ERT Mike
Upvotes: 1