Joseph
Joseph

Reputation: 135

How to Group Data with Matching Values

I am new to using Pandas and I am trying to restructure a dataframe to remove the duplicates in my first column, while also keeping the number of each duplicate, and taking the sum of values in the second column.

For example, I would like the conversion to look something like this:

[In]:

+---+------+-------+
|   | Name | Value |
+---+------+-------+
| 0 | A    |     5 |
| 1 | B    |     5 |
| 2 | C    |    10 |
| 3 | A    |    15 |
| 4 | A    |     5 |
| 5 | C    |    10 |
+---+------+-------+

[Out]:

+---+------+--------+-------+
|   | Name | Number | Total |
+---+------+--------+-------+
| 0 | A    |      3 |    25 |
| 1 | B    |      1 |     5 |
| 2 | C    |      2 |    20 |
+---+------+--------+-------+

So far, I haven't been able to find an efficient method to do this. (Or even a working method.)

I will be working with several hundred thousand rows, so I will need to find a pretty efficient method.

Upvotes: 0

Views: 44

Answers (1)

Rafal Janik
Rafal Janik

Reputation: 309

The pandas agg function on a groupby is what you want. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html

Here is an example:

import pandas as pd

df=pd.DataFrame({'Name':['A','B','C','A','A','C'],
                 'Value':[5,5,10,15,5,10]})

df.groupby('Name').agg(['count','sum'])

Hope that helps.

Upvotes: 2

Related Questions