dagrun
dagrun

Reputation: 651

Adding values to Pandas dataframe from a second dataframe

I have two pandas dataframes like this:

category    time         day      1         2           3

a            24           1      15         35          20        
a            22           1      40         35          15                       


category     day    time      total

 b            1      19         10 
 b            1      20         10   
 b            1      21         10  
 b            1      22         10  
 b            1      23         10 
 b            1      24         10   
 b            2      1          10

 a            1      0          10  
 a            1      1          10  
 a            1      2          10  
 a            1      3          10  
 a            1      4          10  
 a            1      5          10  
 a            1      6          10  
 a            1      7          10  
 a            1      8          10  
 a            1      9          10  
 a            1      10         10  
 a            1      11         10  
 a            1      12         10  
 a            1      13         10  
 a            1      14         10  
 a            1      15         10  
 a            1      16         10  
 a            1      17         10  
 a            1      18         10       
 a            1      19         10 
 a            1      20         10   
 a            1      21         10  
 a            1      22         10  
 a            1      23         10 
 a            1      24         10   
 a            2      1          10

I would like to add the element in column '2' in the first dataframe to the corresponding element in the second dataframe, the element in column '1' should be added to the cell above and the one in column '3' to the cell belove

Rendering this result:

category     day    time      total

 b            1      19         10 
 b            1      20         10   
 b            1      21         10  
 b            1      22         10  
 b            1      23         10 
 b            1      24         10   
 b            2      1          10

 a            1      0          10  
 a            1      1          10  
 a            1      2          10  
 a            1      3          10  
 a            1      4          10  
 a            1      5          10  
 a            1      6          10  
 a            1      7          10  
 a            1      8          10  
 a            1      9          10  
 a            1      10         10  
 a            1      11         10  
 a            1      12         10  
 a            1      13         10  
 a            1      14         10  
 a            1      15         10  
 a            1      16         10  
 a            1      17         10  
 a            1      18         10       
 a            1      19         10
 a            1      20         10   
 a            1      21         10      + 40
 a            1      22         10      + 35
 a            1      23         10 + 15 + 15 
 a            1      24         10 + 35   
 a            2      1          10 + 20

Right now I use a loop but it is to slow, and I need something faster:

for row in df_1:
            date = row[0]
            time = row[1]
            category = row[2]
            total = row[3:]
            index = df_2.index[(df_2['date'] == date) & (df_2['time'] == time)].[0] & (df_2['category'] == category)]

            df_2['total'].iloc[index - 1 : index + 1 + 1 ] = df_2['total'].iloc[index - 1 : index + 1 + 1] + total

Is there a good way to do this using pandas? Should I set the index of my second data frame to "day" and "time" to be able to retrieve them fast? I have a very large dataset so it is important to me that the approach I choose is fast.

Upvotes: 0

Views: 107

Answers (1)

Scott Boston
Scott Boston

Reputation: 153510

Let's see if this is faster:

df11 = df1.melt(['time','day'])

df12 = (df11.rename(columns={'value':'total'})
            .set_index(['day',
                     df11['time']+df11.groupby(['time','day']).cumcount()-1])
            .drop(['time','variable'], axis=1)
            .rename_axis(['day','time']).sum(level=[0,1]))

df_out = df2.set_index(['day','time']).add(df12, fill_value=0).reset_index()

print(df_out)

Output:

   day  time  total
0    1    14   10.0
1    1    15   50.0
2    1    16   45.0
3    1    17   40.0
4    1    18   45.0
5    1    19   30.0

Upvotes: 1

Related Questions