Reputation: 651
I have two pandas dataframes like this:
category time day 1 2 3
a 24 1 15 35 20
a 22 1 40 35 15
category day time total
b 1 19 10
b 1 20 10
b 1 21 10
b 1 22 10
b 1 23 10
b 1 24 10
b 2 1 10
a 1 0 10
a 1 1 10
a 1 2 10
a 1 3 10
a 1 4 10
a 1 5 10
a 1 6 10
a 1 7 10
a 1 8 10
a 1 9 10
a 1 10 10
a 1 11 10
a 1 12 10
a 1 13 10
a 1 14 10
a 1 15 10
a 1 16 10
a 1 17 10
a 1 18 10
a 1 19 10
a 1 20 10
a 1 21 10
a 1 22 10
a 1 23 10
a 1 24 10
a 2 1 10
I would like to add the element in column '2' in the first dataframe to the corresponding element in the second dataframe, the element in column '1' should be added to the cell above and the one in column '3' to the cell belove
Rendering this result:
category day time total
b 1 19 10
b 1 20 10
b 1 21 10
b 1 22 10
b 1 23 10
b 1 24 10
b 2 1 10
a 1 0 10
a 1 1 10
a 1 2 10
a 1 3 10
a 1 4 10
a 1 5 10
a 1 6 10
a 1 7 10
a 1 8 10
a 1 9 10
a 1 10 10
a 1 11 10
a 1 12 10
a 1 13 10
a 1 14 10
a 1 15 10
a 1 16 10
a 1 17 10
a 1 18 10
a 1 19 10
a 1 20 10
a 1 21 10 + 40
a 1 22 10 + 35
a 1 23 10 + 15 + 15
a 1 24 10 + 35
a 2 1 10 + 20
Right now I use a loop but it is to slow, and I need something faster:
for row in df_1:
date = row[0]
time = row[1]
category = row[2]
total = row[3:]
index = df_2.index[(df_2['date'] == date) & (df_2['time'] == time)].[0] & (df_2['category'] == category)]
df_2['total'].iloc[index - 1 : index + 1 + 1 ] = df_2['total'].iloc[index - 1 : index + 1 + 1] + total
Is there a good way to do this using pandas? Should I set the index of my second data frame to "day" and "time" to be able to retrieve them fast? I have a very large dataset so it is important to me that the approach I choose is fast.
Upvotes: 0
Views: 107
Reputation: 153510
Let's see if this is faster:
df11 = df1.melt(['time','day'])
df12 = (df11.rename(columns={'value':'total'})
.set_index(['day',
df11['time']+df11.groupby(['time','day']).cumcount()-1])
.drop(['time','variable'], axis=1)
.rename_axis(['day','time']).sum(level=[0,1]))
df_out = df2.set_index(['day','time']).add(df12, fill_value=0).reset_index()
print(df_out)
Output:
day time total
0 1 14 10.0
1 1 15 50.0
2 1 16 45.0
3 1 17 40.0
4 1 18 45.0
5 1 19 30.0
Upvotes: 1