pitosalas
pitosalas

Reputation: 10862

Merge not behaving as expected in Pandas

I am trying to compute zscores for a subset of the columns in my dataframe (combo) and then create new columns in that data frame for those zscores. Notice that when the zscores are pd.concat'ed the resultant new columns are all NaN. That's the problem I need help with.

I think it might have to do with how concat is adding the new columns, because there's no unique key to match up. But when I tried to preserve the email in the zcores intermediate table it didn't solve the problem. So it might be something else.

zscores = combos.loc[:,pa_grade_cols].dropna(axis=0)
zscores = zscores.apply(zscore)
zscores = zscores.rename(lambda x:colrename(x, "zscore "), axis=1)
newcombo = pd.concat([combo, zscores], axis=1)

combo.iloc[4]: 

email            [email protected]
all pas                             54.84
all partic                          92.21
course                              60.39
pa grade PA01                        67.7
pa grade PA02                          82
pa grade PA03                          21
pa grade PA04                           0
pa grade PA05                          43
pa grade PA06                          29
pa grade PA07                          61
pa grade PA08                          63
pa grade PA09                         NaN
pa grade PA10                          72
pa grade PA11                           0
resub PA01                            NaN
resub PA02                            NaN
resub PA03                            NaN
resub PA04                            NaN
resub PA05                            NaN
resub PA06                            NaN
resub PA07                            NaN
resub PA08                            NaN
resub PA09                            NaN
resub PA10                            NaN
resub PA11                            NaN
initial PA01                           56
initial PA02                      83.3333
initial PA03                           30
initial PA04                            0
initial PA05                           61
initial PA06                           42
initial PA07                           80
initial PA08                           90
initial PA09                          NaN
initial PA10                           97
initial PA11                            0
resubmits                               0
resub mean                            NaN
initial mean                      53.9333
pa grade mean                       43.87
Name: 4, dtype: object

zscores.iloc[4]:

zscore PA01   -0.562523
zscore PA02   -0.418858
zscore PA03   -1.722308
zscore PA04   -1.378762
zscore PA05   -2.291849
zscore PA06   -0.503729
zscore PA07   -0.343543
zscore PA08   -2.037249
zscore PA09   -0.064932
zscore PA10   -0.428859
zscore PA11   -0.735842
Name: 5, dtype: float64

newcombo:

email            [email protected]
all pas                             54.84
all partic                          92.21
course                              60.39
pa grade PA01                        67.7
pa grade PA02                          82
pa grade PA03                          21
pa grade PA04                           0
pa grade PA05                          43
pa grade PA06                          29
pa grade PA07                          61
pa grade PA08                          63
pa grade PA09                         NaN
pa grade PA10                          72
pa grade PA11                           0
resub PA01                            NaN
resub PA02                            NaN
resub PA03                            NaN
resub PA04                            NaN
resub PA05                            NaN
resub PA06                            NaN
resub PA07                            NaN
resub PA08                            NaN
resub PA09                            NaN
resub PA10                            NaN
resub PA11                            NaN
initial PA01                           56
initial PA02                      83.3333
initial PA03                           30
initial PA04                            0
initial PA05                           61
initial PA06                           42
initial PA07                           80
initial PA08                           90
initial PA09                          NaN
initial PA10                           97
initial PA11                            0
resubmits                               0
resub mean                            NaN
initial mean                      53.9333
pa grade mean                       43.87
zscore PA01                           NaN
zscore PA02                           NaN
zscore PA03                           NaN
zscore PA04                           NaN
zscore PA05                           NaN
zscore PA06                           NaN
zscore PA07                           NaN
zscore PA08                           NaN
zscore PA09                           NaN
zscore PA10                           NaN
zscore PA11                           NaN
Name: 4, dtype: object

Upvotes: 0

Views: 55

Answers (1)

jezrael
jezrael

Reputation: 862581

It is expected behaviour, because dropna filter out all rows in subset with NaNs, so last concat add only filtered new rows and another values are converted to NaNs:

combos = pd.DataFrame({'A':list('abcdef'),
                   'B':[np.nan,5,4,5,5,4],
                   'C':[7,8,9,np.nan,2,3],
                   'D':[1,3,5,np.nan,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (combos)
   A    B    C    D  E  F
0  a  NaN  7.0  1.0  5  a
1  b  5.0  8.0  3.0  3  a
2  c  4.0  9.0  5.0  6  a
3  d  5.0  NaN  NaN  9  b
4  e  5.0  2.0  1.0  2  b
5  f  4.0  3.0  0.0  4  b

#sample function
def zscore(x):
    return x * 100

pa_grade_cols = ['B','C','D']
zscores = combos.loc[:,pa_grade_cols].dropna(axis=0)
zscores = zscores.apply(zscore)
zscores = zscores.add_prefix('zsores_')
newcombo = pd.concat([combos, zscores], axis=1)
print (newcombo)
   A    B    C    D  E  F  zsores_B  zsores_C  zsores_D
0  a  NaN  7.0  1.0  5  a       NaN       NaN       NaN
1  b  5.0  8.0  3.0  3  a     500.0     800.0     300.0
2  c  4.0  9.0  5.0  6  a     400.0     900.0     500.0
3  d  5.0  NaN  NaN  9  b       NaN       NaN       NaN
4  e  5.0  2.0  1.0  2  b     500.0     200.0     100.0
5  f  4.0  3.0  0.0  4  b     400.0     300.0       0.0

Detail:

print (zscores)
   zsores_B  zsores_C  zsores_D
1     500.0     800.0     300.0
2     400.0     900.0     500.0
4     500.0     200.0     100.0
5     400.0     300.0       0.0

Upvotes: 1

Related Questions