robocop
robocop

Reputation: 19

Adding a new column in pandas which is the total sum of the values of another column

So I'm using pandas and trying to add a new column in called 'Total' where its the sum of all the numbers of vehicles for that year.

From this:

    type            year     number

Private cars        2005    401638
Motorcycles         2005    138588
Off peak cars       2005    12947
Motorcycles         2005    846

To something like this:

 type            year       number       Total

Private cars        2005    401638      554019
Motorcycles         2005    138588
Off peak cars       2005    12947
Motorcycles         2005    846

Upvotes: 0

Views: 81

Answers (3)

Alessandro
Alessandro

Reputation: 865

This gives a similar dataframe:

total = df['numer'].sum()
df['Total'] = np.ones_line(df['number'].values) * total

Upvotes: 0

jezrael
jezrael

Reputation: 863166

Use GroupBy.transform and then if necessary replace duplicated values:

df['Total'] = df.groupby('year')['number'].transform('sum')
print (df)
            type  year  number  Total
0   Private cars  2005       1      3
1    Motorcycles  2005       2      3
2  Off peak cars  2006       5     20
3    Motorcycles  2006       7     20
4   Motorcycles1  2006       8     20

df.loc[df['year'].duplicated(), 'Total'] = np.nan
print (df)
            type  year  number  Total
0   Private cars  2005       1    3.0
1    Motorcycles  2005       2    NaN
2  Off peak cars  2006       5   20.0
3    Motorcycles  2006       7    NaN
4   Motorcycles1  2006       8    NaN

Replacing to empty values is possible, but not recommended, because get mixed values numeric with strings and some function should failed:

df.loc[df['year'].duplicated(), 'Total'] = ''
print (df)
            type  year  number Total
0   Private cars  2005       1     3
1    Motorcycles  2005       2      
2  Off peak cars  2006       5    20
3    Motorcycles  2006       7      
4   Motorcycles1  2006       8      

Upvotes: 2

jpp
jpp

Reputation: 164773

Using GroupBy + transform with sum:

df['Year_Total'] = df.groupby('year')['number'].transform('sum')

Note this will give you the yearly total for each row. If you wish to "blank out" totals for certain rows, you should specify precisely the logic for this.

Upvotes: 2

Related Questions