Reputation: 33
I've googled quite some bit regarding this and could not find an answer that applied to my problem. The issue Im having is that I've got a dataframe and each row has a variable and I want to continuously insert rows with variable C which is the value of variable A + B. Example:
TOWN YEAR Var Value
Amsterdam 2019 A 1
Amsterdam 2019 B 2
Amsterdam 2020 A 1
Amsterdam 2020 B 3
Rotterdam 2019 A 4
Rotterdam 2019 B 4
Rotterdam 2020 A 5
Rotterdam 2020 B 2
Where the desired output would insert a row and sum A and B respectively for rows that are identical in the other columns. My attempt right now backfired as I used groupby and sum, then converted it into a list and then just tried to append it as a seperate column (var_C). The reason it backfired is because I had to duplicate each value to match the length of the original dataset. In the end the length of the list did not match the length of the original dataset.
data_current = data[data['var'].isin(['A', 'B'])]
data_var_c = data_current.groupby(['TOWN', 'year'])['value'].sum()
values = data_var_c.tolist()
values_dup = [val for val in values for _ in (0, 1)]
len(values_dup)
Any feedback would be appreciated!
Upvotes: 3
Views: 658
Reputation: 35686
A pivot stack option:
import pandas as pd
df = pd.DataFrame({
'TOWN': {0: 'Amsterdam', 1: 'Amsterdam', 2: 'Amsterdam', 3: 'Amsterdam',
4: 'Rotterdam', 5: 'Rotterdam', 6: 'Rotterdam', 7: 'Rotterdam'},
'YEAR': {0: 2019, 1: 2019, 2: 2020, 3: 2020, 4: 2019, 5: 2019, 6: 2020,
7: 2020},
'Var': {0: 'A', 1: 'B', 2: 'A', 3: 'B', 4: 'A', 5: 'B', 6: 'A', 7: 'B'},
'Value': {0: 1, 1: 2, 2: 1, 3: 3, 4: 4, 5: 4, 6: 5, 7: 2}
})
new_df = df.pivot(index=['TOWN', 'YEAR'], columns='Var')['Value'] \
.assign(C=lambda x: x.agg('sum', axis=1)) \
.stack() \
.rename('Value') \
.reset_index()
print(new_df)
new_df
:
TOWN YEAR Var Value
0 Amsterdam 2019 A 1
1 Amsterdam 2019 B 2
2 Amsterdam 2019 C 3
3 Amsterdam 2020 A 1
4 Amsterdam 2020 B 3
5 Amsterdam 2020 C 4
6 Rotterdam 2019 A 4
7 Rotterdam 2019 B 4
8 Rotterdam 2019 C 8
9 Rotterdam 2020 A 5
10 Rotterdam 2020 B 2
11 Rotterdam 2020 C 7
Upvotes: 0
Reputation: 33
I overcomplicated it, it is as simple as grouping by TOWN
and Year
, taking the value
column and apply a sum
function to get the overall sum:
data['c'] = data_current.groupby(['TOWN', 'year'])['value'].transform('sum')
This, however, is not the desired output as it adds the summation as another column. Whereas, Nk03's answer adds the summation as a seperate row.
Upvotes: 0
Reputation: 14949
You can use groupby
and pd.concat
:
result = (
pd.concat([
df,
df.groupby(['TOWN', 'YEAR'], as_index=False)
.agg(sum)
.assign(Var = 'C')
])
)
result = result.sort_values(['TOWN', 'YEAR', 'Var'])
OUTPUT:
TOWN YEAR Var Value
0 Amsterdam 2019 A 1
1 Amsterdam 2019 B 2
0 Amsterdam 2019 C 3
2 Amsterdam 2020 A 1
3 Amsterdam 2020 B 3
1 Amsterdam 2020 C 4
4 Rotterdam 2019 A 4
5 Rotterdam 2019 B 4
2 Rotterdam 2019 C 8
6 Rotterdam 2020 A 5
7 Rotterdam 2020 B 2
3 Rotterdam 2020 C 7
Upvotes: 5