ManuelVillamil
ManuelVillamil

Reputation: 87

How can I create a column in an actual dataframe by indexing another dataframe using the values in two columns from the actual dataframe

Good day. I have two dataset (df1,df2). I am trying to fill the column 'values' in df2 by using the column 'site_before' as the row index in df1 and the column 'site' as the column index in df1.

The dataset df1:

      ANA01  PHO01  ATL  BAL12  BOS07   
ANA01   0     0      3     3     3
PHO01   0     0      3     3     3
ATL    -3    -3      0     0     0 
BAL12   -3   -3      0     0     0 
BOS07   -3    -3     0     0     0

"The first column is the indexes of the rows"

The dataset df2:

    Game_ID     site_before  site   values
1   ANA199804010    ANA01   ANA01   
3   ANA199804020    ANA01   ATL 
5   ANA199804030    ANA01   BAL12   
7   ANA199804040    ANA01   BOS07   
9   ANA199804050    ANA01   ANA01   
674 BOS199804300    BOS07   BOS07   
31  ANA199805010    BOS07   ANA01   
33  ANA199805020    PHO01   ANA01   
35  ANA199805030    PHO01   PHO01   
37  ANA199805040    PHO01   ATL 
39  ANA199805050    PHO01   BAL12

I tried to do:

df2['values'] = df1.loc[df2['site_before'], df2['site']].values

but I got an error as ValueError: Wrong number of items passed 4864, placement implies 1

The result I am expecting is:

    Game_ID     site_before site    values
1   ANA199804010    ANA01   ANA01   0
3   ANA199804020    ANA01   ATL     3
5   ANA199804030    ANA01   BAL12   3
7   ANA199804040    ANA01   BOS07   3
9   ANA199804050    ANA01   ANA01   0
674 BOS199804300    BOS07   BOS07   0
31  ANA199805010    BOS07   ANA01   -3
33  ANA199805020    PHO01   ANA01   0
35  ANA199805030    PHO01   PHO01   0
37  ANA199805040    PHO01   ATL     3
39  ANA199805050    PHO01   BAL12   3

Upvotes: 1

Views: 42

Answers (1)

jezrael
jezrael

Reputation: 862771

Use DataFrame.join with new MultiIndex Series created by DataFrame.stack:

df2 = df2.join(df1.stack().rename('new').rename_axis(('site_before','site')), 
               on=['site_before','site'])
print (df2)
          Game_ID site_before   site  new
1    ANA199804010       ANA01  ANA01    0
3    ANA199804020       ANA01    ATL    3
5    ANA199804030       ANA01  BAL12    3
7    ANA199804040       ANA01  BOS07    3
9    ANA199804050       ANA01  ANA01    0
674  BOS199804300       BOS07  BOS07    0
31   ANA199805010       BOS07  ANA01   -3
33   ANA199805020       PHO01  ANA01    0
35   ANA199805030       PHO01  PHO01    0
37   ANA199805040       PHO01    ATL    3
39   ANA199805050       PHO01  BAL12    3

Alternative is use DataFrame.melt with DataFrame.merge and left join:

df3 = df1.rename_axis('site_before').reset_index().melt('site_before', var_name='site')

df2 = df2.merge(df3, how='left')
print (df2)
         Game_ID site_before   site  new
0   ANA199804010       ANA01  ANA01    0
1   ANA199804020       ANA01    ATL    3
2   ANA199804030       ANA01  BAL12    3
3   ANA199804040       ANA01  BOS07    3
4   ANA199804050       ANA01  ANA01    0
5   BOS199804300       BOS07  BOS07    0
6   ANA199805010       BOS07  ANA01   -3
7   ANA199805020       PHO01  ANA01    0
8   ANA199805030       PHO01  PHO01    0
9   ANA199805040       PHO01    ATL    3
10  ANA199805050       PHO01  BAL12    3

Upvotes: 1

Related Questions