Python Pandas: replace groupby operation

Question

I have the following table as a pandas dataframe:

| ID | Name | Sales | Source   |
|----|------|-------|----------|
| 1  | a    | 34    | Source A |
| 2  | b    | 3423  | Source A |
| 3  | c    | 2     | Source A |
| 4  | d    | 342   | Source A |
| 3  | c    | 34    | Source A |
| 5  | e    | 234   | Source A |
| 6  | f    | 234   | Source A |
| 7  | g    | 23    | Source A |
| 1  | a    | 12    | Source B |
| 2  | b    | 42    | Source B |
| 3  | c    | 9     | Source B |
| 2  | b    | 22    | Source B |
| 1  | a    | 1     | Source B |
| 8  | h    | 56    | Source B |

What is the best way to (i) aggregate sales for each ID for each soure and (ii) put the result in two new columns "Source A" and "Source B" such that the resulting dataframe looks as follows:

| ID | Name | Source A | Source B |
|----|------|----------|----------|
| 1  | a    | 34       | 13       |
| 2  | b    | 3423     | 64       |
| 3  | c    | 36       | 9        |
| 4  | d    | 342      | 0        |
| 5  | e    | 234      | 0        |
| 6  | f    | 234      | 0        |
| 7  | g    | 23       | 0        |
| 8  | h    | 0        | 56       |

My initial approach was as follows:

data = {"ID":[1,2,3,4,3,5,6,7,1,2,3,2,1,8], 
      "Name":list("abcdcefgabcbah"), 
      "Sales":[34,3423,2,342,34,234,234,23,12,42,9,22,1,56],
      "Source":["Source A"]*8 + ["Source B"]*6
     }
df = pd.DataFrame(data)

df.groupby(["ID","Name","Source"])["Sales"].sum().unstack()

Question: my initial table is build using different files and than applying pd.concat. So it feels like I could achieve the final table by concatenating (or merging) differently in the first place. Is there a better approach to achieve this? As a side node: the actual data table consists out of 6 different sources.

Thanks for your help!

Scott Boston · Accepted Answer

You can use `pd.crosstab`:

pd.crosstab(df.Name, df.Source, df.Sales, aggfunc='sum').fillna(0)

Output:

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Or, pivot_table

df.pivot_table('Sales','Name','Source', aggfunc='sum').fillna(0)

Output:

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Or using `set_index` and `sum` with `level` parameter then `unstack`:

df.set_index(['Name','Source'])['Sales'].sum(level=[0,1]).unstack(fill_value=0)

Output:

Source  Source A  Source B
Name                      
a             34        13
b           3423        64
c             36         9
d            342         0
e            234         0
f            234         0
g             23         0
h              0        56

Python Pandas: replace groupby operation

Answers (2)

You can use `pd.crosstab`:

Or, pivot_table

Or using `set_index` and `sum` with `level` parameter then `unstack`:

Related Questions

Python Pandas: replace groupby operation

Answers (2)

You can use pd.crosstab:

Or, pivot_table

Or using set_index and sum with level parameter then unstack:

Related Questions

You can use `pd.crosstab`:

Or using `set_index` and `sum` with `level` parameter then `unstack`: