user3664020
user3664020

Reputation: 3020

dcast replicate in python

I have a big pandas data frame something like as follows:

col1     col2     col3    col4
 a        d        sd       2
 b        sd       sd       2
 a        ds       hg       3
 a        ew       rt       3
 b        ss       qq       4

I want the output similar to the following:

col1     sum
a        8
b        6

'sum' column has the summation of all values of col4 for unique values of col1. In R this can be done using dcast

dcast(dataframe, col1 ~ count, sum, value.var = 'col4')

How do i do this in python?

Upvotes: 0

Views: 1042

Answers (1)

abarnert
abarnert

Reputation: 365945

I think what you're looking for is what's described in Group By: split-apply-combine:

groups = df.groupby('col1')
splitgroups = groups['col4']
sums = splitgroups.aggregate(np.sum)

Or, more directly:

sums = df.groupby('col1').aggregate({'col4': np.sum})

But read the whole page instead; the Pandas groupby feature is more flexible than R dcast (it's designed to also accomplish everything SQL aggregation, Excel pivots, etc. can do), but that means your ideas may not always map one-to-one between the two.

Here it is in action:

>>> # your DataFrame, with a default index
>>> df = pd.DataFrame({'col1': 'a b a a b'.split(), 'col2': 'd sd ds ew ss'.split(), 'col3': 'sd sd hg rt qq'.split(), 'col4': (2, 2, 3, 3, 4)})
>>> sums = df.groupby('col1').aggregate({'col4': np.sum})
>>> sums
      col4
col1
a        8
b        6

Upvotes: 2

Related Questions