Reputation: 3020
I have a big pandas data frame something like as follows:
col1 col2 col3 col4
a d sd 2
b sd sd 2
a ds hg 3
a ew rt 3
b ss qq 4
I want the output similar to the following:
col1 sum
a 8
b 6
'sum' column has the summation of all values of col4 for unique values of col1.
In R this can be done using dcast
dcast(dataframe, col1 ~ count, sum, value.var = 'col4')
How do i do this in python?
Upvotes: 0
Views: 1042
Reputation: 365945
I think what you're looking for is what's described in Group By: split-apply-combine:
groups = df.groupby('col1')
splitgroups = groups['col4']
sums = splitgroups.aggregate(np.sum)
Or, more directly:
sums = df.groupby('col1').aggregate({'col4': np.sum})
But read the whole page instead; the Pandas groupby
feature is more flexible than R dcast
(it's designed to also accomplish everything SQL aggregation, Excel pivots, etc. can do), but that means your ideas may not always map one-to-one between the two.
Here it is in action:
>>> # your DataFrame, with a default index
>>> df = pd.DataFrame({'col1': 'a b a a b'.split(), 'col2': 'd sd ds ew ss'.split(), 'col3': 'sd sd hg rt qq'.split(), 'col4': (2, 2, 3, 3, 4)})
>>> sums = df.groupby('col1').aggregate({'col4': np.sum})
>>> sums
col4
col1
a 8
b 6
Upvotes: 2