stkubr
stkubr

Reputation: 391

Pandas left join gives NaN

DataFrameA

                                 RunId  isClean  isFinished
0                 APAC_P1_HSFR_REGTEST      0.0         1.0
0                      APAC_P1_REGTEST      1.0         1.0
0                APAC_P2a_HSFR_REGTEST      0.0         0.0
0                     APAC_P2a_REGTEST      0.0         1.0
0                APAC_P2b_HSFR_REGTEST      0.0         0.0
0                     APAC_P2b_REGTEST      1.0         1.0
0                APAC_P2c_HSFR_REGTEST      0.0         0.0
0                     APAC_P2c_REGTEST      0.0         1.0
0                APAC_P3a_HSFR_REGTEST      0.0         0.0
0                     APAC_P3a_REGTEST      0.0         0.0
0                APAC_P3b_HSFR_REGTEST      0.0         0.0
0                     APAC_P3b_REGTEST      0.0         0.0
0                      Cliquet_REGTEST      0.0         1.0

DataFrameB

                             RunId  Status
0            APAC_P2a_HSFR_REGTEST   Error
2            APAC_P2b_HSFR_REGTEST   Error
4            APAC_P2c_HSFR_REGTEST   Error
6            APAC_P3a_HSFR_REGTEST   Error
8            APAC_P3b_HSFR_REGTEST   Error
10             CS_P10_HSFR_REGTEST   Error
12             CS_P11_HSFR_REGTEST   Error
14              CS_P1_HSFR_REGTEST   Error
16              CS_P2_HSFR_REGTEST   Error
18              CS_P3_HSFR_REGTEST   Error
20              CS_P4_HSFR_REGTEST   Error
22              CS_P5_HSFR_REGTEST   Error
24              CS_P6_HSFR_REGTEST   Error

DataFrameA.join(DataFrameB,on='RunId',how='left',rsuffix='_y')

                                 RunId  isClean  isFinished RunId_y Status
0                 APAC_P1_HSFR_REGTEST      0.0         1.0     NaN    NaN
0                      APAC_P1_REGTEST      1.0         1.0     NaN    NaN
0                APAC_P2a_HSFR_REGTEST      0.0         0.0     NaN    NaN
0                     APAC_P2a_REGTEST      0.0         1.0     NaN    NaN
0                APAC_P2b_HSFR_REGTEST      0.0         0.0     NaN    NaN
0                     APAC_P2b_REGTEST      1.0         1.0     NaN    NaN
0                APAC_P2c_HSFR_REGTEST      0.0         0.0     NaN    NaN
0                     APAC_P2c_REGTEST      0.0         1.0     NaN    NaN
0                APAC_P3a_HSFR_REGTEST      0.0         0.0     NaN    NaN
0                     APAC_P3a_REGTEST      0.0         0.0     NaN    NaN
0                APAC_P3b_HSFR_REGTEST      0.0         0.0     NaN    NaN
0                     APAC_P3b_REGTEST      0.0         0.0     NaN    NaN
0                      Cliquet_REGTEST      0.0         1.0     NaN    NaN

A question, why all NaNs ? There are nonNaN intersections like APAC_P2a_HSFR_REGTEST. Sorry if it's a dumb question, but googling did not helped.

Upvotes: 6

Views: 20467

Answers (1)

jezrael
jezrael

Reputation: 863166

Problem is join working with index, so need create index by RunId column:

df = DataFrameA.join(DataFrameB.set_index('RunId'),on='RunId',how='left',rsuffix='_y')

Or use merge:

df = DataFrameA.merge(DataFrameB,on='RunId',how='left')

print (df)
                    RunId  isClean  isFinished Status
0    APAC_P1_HSFR_REGTEST      0.0         1.0    NaN
1         APAC_P1_REGTEST      1.0         1.0    NaN
2   APAC_P2a_HSFR_REGTEST      0.0         0.0  Error
3        APAC_P2a_REGTEST      0.0         1.0    NaN
4   APAC_P2b_HSFR_REGTEST      0.0         0.0  Error
5        APAC_P2b_REGTEST      1.0         1.0    NaN
6   APAC_P2c_HSFR_REGTEST      0.0         0.0  Error
7        APAC_P2c_REGTEST      0.0         1.0    NaN
8   APAC_P3a_HSFR_REGTEST      0.0         0.0  Error
9        APAC_P3a_REGTEST      0.0         0.0    NaN
10  APAC_P3b_HSFR_REGTEST      0.0         0.0  Error
11       APAC_P3b_REGTEST      0.0         0.0    NaN
12        Cliquet_REGTEST      0.0         1.0    NaN

Upvotes: 11

Related Questions