Size and percentage of elements

Question

I'm reading a CSV file with pandas and after I read the file I'd like to calculate 2 things:

Number of items
% of items

For example if my data is [X,X,Y,Z,Z,X,X,Y,Z,Y] , I want my output to be

X 4 40.0
Y 3 30.0
Z 3 30.0

I tried the following but it only outputs the sums

train = pd.read_csv("./../input/train.csv")
grouped = train.groupby([x ,y]).size()

And this only calculates the percentages:

train = pd.read_csv("./../input/train.csv")
grouped = grouped.groupby(level=[0]).apply(lambda x: x / x.sum())

How can I get both?

jezrael · Accepted Answer

I think need for percentage column divide by div new count column by sum:

df = pd.DataFrame({'A':list('XXYZZXXYZY')})

df = df.groupby('A').size().reset_index(name='count')
df['%'] = df['count'].div(df['count'].sum()).mul(100)
print (df)
   A  count     %
0  X      4  40.0
1  Y      3  30.0
2  Z      3  30.0

Alternative solution with value_counts:

df = pd.concat([df['A'].value_counts().rename('count'), 
                df['A'].value_counts(normalize=True).rename('%').mul(100)], axis=1)

df = df.rename_axis('A').reset_index()
print (df)
   A  count     %
0  X      4  40.0
1  Y      3  30.0
2  Z      3  30.0

Size and percentage of elements

Answers (2)

Related Questions