Caner
Caner

Reputation: 59238

Size and percentage of elements

I'm reading a CSV file with pandas and after I read the file I'd like to calculate 2 things:

  1. Number of items
  2. % of items

For example if my data is [X,X,Y,Z,Z,X,X,Y,Z,Y] , I want my output to be

X 4 40.0
Y 3 30.0
Z 3 30.0

I tried the following but it only outputs the sums

train = pd.read_csv("./../input/train.csv")
grouped = train.groupby([x ,y]).size()

And this only calculates the percentages:

train = pd.read_csv("./../input/train.csv")
grouped = grouped.groupby(level=[0]).apply(lambda x: x / x.sum())

How can I get both?

Upvotes: 2

Views: 90

Answers (2)

Pierre Gourseaud
Pierre Gourseaud

Reputation: 2477

I would calculate the two separately and concatenate them :

d = {'col_one': ['X','X','Y','Z','Z','X','X','Y','Z','Y']}
df = pd.DataFrame(data=d)

nb_rows = len(df)

serie_count = df.groupby('col_one').size().rename('count')
serie_percentage = (100.*serie_count/nb_rows).rename('percentage')

final_df = pd.concat([serie_count, serie_percentage], axis=1)

Output:

        count   percentage
col_one
X       4       40.0
Y       3       30.0
Z       3       30.0    

Upvotes: 1

jezrael
jezrael

Reputation: 863166

I think need for percentage column divide by div new count column by sum:

df = pd.DataFrame({'A':list('XXYZZXXYZY')})

df = df.groupby('A').size().reset_index(name='count')
df['%'] = df['count'].div(df['count'].sum()).mul(100)
print (df)
   A  count     %
0  X      4  40.0
1  Y      3  30.0
2  Z      3  30.0

Alternative solution with value_counts:

df = pd.concat([df['A'].value_counts().rename('count'), 
                df['A'].value_counts(normalize=True).rename('%').mul(100)], axis=1)

df = df.rename_axis('A').reset_index()
print (df)
   A  count     %
0  X      4  40.0
1  Y      3  30.0
2  Z      3  30.0

Upvotes: 3

Related Questions