ccsucic
ccsucic

Reputation: 129

Adding a column with values from another dataframe based on column conditions

I have two dataframes of differing index lengths that look like this:

df_1:

    State  Month  Total Time ... N columns
     AL     4       1000
     AL     5       500
      .
      .
      .
     VA     11      750
     VA     12      1500 

df_2:

    State   Month ... N columns
     AL      4        
     AL      5
      .
      .
      .
     VA      11
     VA      12

I would like to add a Total Time column to df_2 that uses the values from the Total Time column of df_1 if the State and Month value are the same between dataframes. Ultimately, I would end up with:

df_2:

    State  Month  Total Time ... N columns
     AL     4       1000
     AL     5       500
      .
      .
      .
     VA     11      750
     VA     12      1500 

I want to do this conditionally since the index lengths are not the same. I have tried this so far:

def f(row):
     if (row['State'] == row['State']) & (row['Month'] == row['Month']):
         val = x for x in df_1["Total Time"]
     return val

df_2['Total Time'] = df_2.apply(f, axis=1)

This did not work. What method would you use to accomplish this? Any help is appreciated!

Upvotes: 4

Views: 19814

Answers (2)

Foundry
Foundry

Reputation: 33

As mentioned in other comment, pd.merge() is how you would join two dataframes and extract a column. The issue is that merging solely on 'State' and 'Month' would result in every permutation becoming a new column (all Al-4 in df_1 would join with all other AL-4 in df_2).

With your example, there'd be

df_1
  State  Month  Total Time  df_1 col...
0    AL      4        1000            6
1    AL      4         500            7
2    VA     12         750            8
3    VA     12        1500            9

df_2
  State  Month  df_2 col...
0    AL      4            1
1    AL      4            2
2    VA     12            3
3    VA     12            4


df_1_cols_to_use = ['State', 'Month', 'Total Time']
# note the selection of the column to use from df_1. We only want the column
# we're merging on, plus the column(s) we want to bring in, in this case 'Total Time'
new_df = pd.merge(df_2, df_1[df_1_cols_to_use], on=['State', 'Month'], how='left')

new_df:
  State  Month  df_2 col...  Total Time
0    AL      4            1        1000
1    AL      4            1         500
2    AL      4            2        1000
3    AL      4            2         500
4    VA     12            3         750
5    VA     12            3        1500
6    VA     12            4         750
7    VA     12            4        1500

You mention these have differing index lengths. Based on the parameters of the question, it's not possible to determine what value of Total Time would match up with a row in df_2. If there's three AL-4 entries in df_2, do they each get 1000, 500, or some combination? That info would be needed. Without this, this would be the best guess at getting all possibilities.

Upvotes: 2

Mayank Porwal
Mayank Porwal

Reputation: 34046

You can do this:

Consider my sample dataframes:

In [2327]: df_1                                                                                                                                                                                              
Out[2327]: 
  State  Month  Total Time
0    AL      2        1000
1    AB      4         500
2    BC      1         600

In [2328]: df_2                                                                                                                                                                                              
Out[2328]: 
  State  Month
0    AL      2
1    AB      5

In [2329]: df_2 = pd.merge(df_2, df_1, on=['State', 'Month'], how='left')                                                                                                                                      

In [2330]: df_2                                                                                                                                                                                              
Out[2330]: 
  State  Month  Total Time
0    AL      2      1000.0
1    AB      5         NaN

Upvotes: 5

Related Questions