Reputation: 19
So I'm using pandas and trying to add a new column in called 'Total' where its the sum of all the numbers of vehicles for that year.
From this:
type year number
Private cars 2005 401638
Motorcycles 2005 138588
Off peak cars 2005 12947
Motorcycles 2005 846
To something like this:
type year number Total
Private cars 2005 401638 554019
Motorcycles 2005 138588
Off peak cars 2005 12947
Motorcycles 2005 846
Upvotes: 0
Views: 81
Reputation: 865
This gives a similar dataframe:
total = df['numer'].sum()
df['Total'] = np.ones_line(df['number'].values) * total
Upvotes: 0
Reputation: 863166
Use GroupBy.transform
and then if necessary replace duplicated values:
df['Total'] = df.groupby('year')['number'].transform('sum')
print (df)
type year number Total
0 Private cars 2005 1 3
1 Motorcycles 2005 2 3
2 Off peak cars 2006 5 20
3 Motorcycles 2006 7 20
4 Motorcycles1 2006 8 20
df.loc[df['year'].duplicated(), 'Total'] = np.nan
print (df)
type year number Total
0 Private cars 2005 1 3.0
1 Motorcycles 2005 2 NaN
2 Off peak cars 2006 5 20.0
3 Motorcycles 2006 7 NaN
4 Motorcycles1 2006 8 NaN
Replacing to empty values is possible, but not recommended, because get mixed values numeric with strings and some function should failed:
df.loc[df['year'].duplicated(), 'Total'] = ''
print (df)
type year number Total
0 Private cars 2005 1 3
1 Motorcycles 2005 2
2 Off peak cars 2006 5 20
3 Motorcycles 2006 7
4 Motorcycles1 2006 8
Upvotes: 2
Reputation: 164773
Using GroupBy
+ transform
with sum
:
df['Year_Total'] = df.groupby('year')['number'].transform('sum')
Note this will give you the yearly total for each row. If you wish to "blank out" totals for certain rows, you should specify precisely the logic for this.
Upvotes: 2