Reputation: 10862
I am trying to compute zscores for a subset of the columns in my dataframe (combo
) and then create new columns in that data frame for those zscores. Notice that when the zscores are pd.concat'ed the resultant new columns are all NaN. That's the problem I need help with.
I think it might have to do with how concat is adding the new columns, because there's no unique key to match up. But when I tried to preserve the email in the zcores intermediate table it didn't solve the problem. So it might be something else.
zscores = combos.loc[:,pa_grade_cols].dropna(axis=0)
zscores = zscores.apply(zscore)
zscores = zscores.rename(lambda x:colrename(x, "zscore "), axis=1)
newcombo = pd.concat([combo, zscores], axis=1)
combo.iloc[4]:
email [email protected]
all pas 54.84
all partic 92.21
course 60.39
pa grade PA01 67.7
pa grade PA02 82
pa grade PA03 21
pa grade PA04 0
pa grade PA05 43
pa grade PA06 29
pa grade PA07 61
pa grade PA08 63
pa grade PA09 NaN
pa grade PA10 72
pa grade PA11 0
resub PA01 NaN
resub PA02 NaN
resub PA03 NaN
resub PA04 NaN
resub PA05 NaN
resub PA06 NaN
resub PA07 NaN
resub PA08 NaN
resub PA09 NaN
resub PA10 NaN
resub PA11 NaN
initial PA01 56
initial PA02 83.3333
initial PA03 30
initial PA04 0
initial PA05 61
initial PA06 42
initial PA07 80
initial PA08 90
initial PA09 NaN
initial PA10 97
initial PA11 0
resubmits 0
resub mean NaN
initial mean 53.9333
pa grade mean 43.87
Name: 4, dtype: object
zscores.iloc[4]:
zscore PA01 -0.562523
zscore PA02 -0.418858
zscore PA03 -1.722308
zscore PA04 -1.378762
zscore PA05 -2.291849
zscore PA06 -0.503729
zscore PA07 -0.343543
zscore PA08 -2.037249
zscore PA09 -0.064932
zscore PA10 -0.428859
zscore PA11 -0.735842
Name: 5, dtype: float64
newcombo:
email [email protected]
all pas 54.84
all partic 92.21
course 60.39
pa grade PA01 67.7
pa grade PA02 82
pa grade PA03 21
pa grade PA04 0
pa grade PA05 43
pa grade PA06 29
pa grade PA07 61
pa grade PA08 63
pa grade PA09 NaN
pa grade PA10 72
pa grade PA11 0
resub PA01 NaN
resub PA02 NaN
resub PA03 NaN
resub PA04 NaN
resub PA05 NaN
resub PA06 NaN
resub PA07 NaN
resub PA08 NaN
resub PA09 NaN
resub PA10 NaN
resub PA11 NaN
initial PA01 56
initial PA02 83.3333
initial PA03 30
initial PA04 0
initial PA05 61
initial PA06 42
initial PA07 80
initial PA08 90
initial PA09 NaN
initial PA10 97
initial PA11 0
resubmits 0
resub mean NaN
initial mean 53.9333
pa grade mean 43.87
zscore PA01 NaN
zscore PA02 NaN
zscore PA03 NaN
zscore PA04 NaN
zscore PA05 NaN
zscore PA06 NaN
zscore PA07 NaN
zscore PA08 NaN
zscore PA09 NaN
zscore PA10 NaN
zscore PA11 NaN
Name: 4, dtype: object
Upvotes: 0
Views: 55
Reputation: 862581
It is expected behaviour, because dropna
filter out all rows in subset with NaN
s, so last concat
add only filtered new rows and another values are converted to NaN
s:
combos = pd.DataFrame({'A':list('abcdef'),
'B':[np.nan,5,4,5,5,4],
'C':[7,8,9,np.nan,2,3],
'D':[1,3,5,np.nan,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (combos)
A B C D E F
0 a NaN 7.0 1.0 5 a
1 b 5.0 8.0 3.0 3 a
2 c 4.0 9.0 5.0 6 a
3 d 5.0 NaN NaN 9 b
4 e 5.0 2.0 1.0 2 b
5 f 4.0 3.0 0.0 4 b
#sample function
def zscore(x):
return x * 100
pa_grade_cols = ['B','C','D']
zscores = combos.loc[:,pa_grade_cols].dropna(axis=0)
zscores = zscores.apply(zscore)
zscores = zscores.add_prefix('zsores_')
newcombo = pd.concat([combos, zscores], axis=1)
print (newcombo)
A B C D E F zsores_B zsores_C zsores_D
0 a NaN 7.0 1.0 5 a NaN NaN NaN
1 b 5.0 8.0 3.0 3 a 500.0 800.0 300.0
2 c 4.0 9.0 5.0 6 a 400.0 900.0 500.0
3 d 5.0 NaN NaN 9 b NaN NaN NaN
4 e 5.0 2.0 1.0 2 b 500.0 200.0 100.0
5 f 4.0 3.0 0.0 4 b 400.0 300.0 0.0
Detail:
print (zscores)
zsores_B zsores_C zsores_D
1 500.0 800.0 300.0
2 400.0 900.0 500.0
4 500.0 200.0 100.0
5 400.0 300.0 0.0
Upvotes: 1