ajayy
ajayy

Reputation: 33

Add rows in Pandas based on condition (grouping)

I've googled quite some bit regarding this and could not find an answer that applied to my problem. The issue Im having is that I've got a dataframe and each row has a variable and I want to continuously insert rows with variable C which is the value of variable A + B. Example:

      TOWN        YEAR           Var     Value                 
      Amsterdam   2019            A        1
      Amsterdam   2019            B        2
      Amsterdam   2020            A        1
      Amsterdam   2020            B        3
      Rotterdam   2019            A        4
      Rotterdam   2019            B        4
      Rotterdam   2020            A        5
      Rotterdam   2020            B        2

Where the desired output would insert a row and sum A and B respectively for rows that are identical in the other columns. My attempt right now backfired as I used groupby and sum, then converted it into a list and then just tried to append it as a seperate column (var_C). The reason it backfired is because I had to duplicate each value to match the length of the original dataset. In the end the length of the list did not match the length of the original dataset.

data_current = data[data['var'].isin(['A', 'B'])]
data_var_c = data_current.groupby(['TOWN', 'year'])['value'].sum()
values = data_var_c.tolist()
values_dup = [val for val in values for _ in (0, 1)]
len(values_dup)

Any feedback would be appreciated!

Upvotes: 3

Views: 658

Answers (3)

Henry Ecker
Henry Ecker

Reputation: 35686

A pivot stack option:

import pandas as pd

df = pd.DataFrame({
    'TOWN': {0: 'Amsterdam', 1: 'Amsterdam', 2: 'Amsterdam', 3: 'Amsterdam',
             4: 'Rotterdam', 5: 'Rotterdam', 6: 'Rotterdam', 7: 'Rotterdam'},
    'YEAR': {0: 2019, 1: 2019, 2: 2020, 3: 2020, 4: 2019, 5: 2019, 6: 2020,
             7: 2020},
    'Var': {0: 'A', 1: 'B', 2: 'A', 3: 'B', 4: 'A', 5: 'B', 6: 'A', 7: 'B'},
    'Value': {0: 1, 1: 2, 2: 1, 3: 3, 4: 4, 5: 4, 6: 5, 7: 2}
})

new_df = df.pivot(index=['TOWN', 'YEAR'], columns='Var')['Value'] \
    .assign(C=lambda x: x.agg('sum', axis=1)) \
    .stack() \
    .rename('Value') \
    .reset_index()

print(new_df)

new_df:

         TOWN  YEAR Var  Value
0   Amsterdam  2019   A      1
1   Amsterdam  2019   B      2
2   Amsterdam  2019   C      3
3   Amsterdam  2020   A      1
4   Amsterdam  2020   B      3
5   Amsterdam  2020   C      4
6   Rotterdam  2019   A      4
7   Rotterdam  2019   B      4
8   Rotterdam  2019   C      8
9   Rotterdam  2020   A      5
10  Rotterdam  2020   B      2
11  Rotterdam  2020   C      7

Upvotes: 0

ajayy
ajayy

Reputation: 33

I overcomplicated it, it is as simple as grouping by TOWN and Year, taking the value column and apply a sum function to get the overall sum:

data['c'] = data_current.groupby(['TOWN', 'year'])['value'].transform('sum')

This, however, is not the desired output as it adds the summation as another column. Whereas, Nk03's answer adds the summation as a seperate row.

Upvotes: 0

Nk03
Nk03

Reputation: 14949

You can use groupby and pd.concat:

result = (
    pd.concat([
        df,
        df.groupby(['TOWN', 'YEAR'], as_index=False)
        .agg(sum)
        .assign(Var = 'C')
        ])
    )

result = result.sort_values(['TOWN', 'YEAR', 'Var'])

OUTPUT:

       TOWN  YEAR Var  Value
0  Amsterdam  2019   A      1
1  Amsterdam  2019   B      2
0  Amsterdam  2019   C      3
2  Amsterdam  2020   A      1
3  Amsterdam  2020   B      3
1  Amsterdam  2020   C      4
4  Rotterdam  2019   A      4
5  Rotterdam  2019   B      4
2  Rotterdam  2019   C      8
6  Rotterdam  2020   A      5
7  Rotterdam  2020   B      2
3  Rotterdam  2020   C      7

Upvotes: 5

Related Questions