find NaN in creating new column based on iteration

Question

i dont know why i find NaN when i create a new column, here the code i use for

df_new['age'] = df_new['age'].astype('Int64')
usia = []
for i in df_new['age']:
if i < 17:
     usia.append('under_age')
if i >= 17 and i < 31:
    usia.append('dewasa_awal')
if i >= 31 and i < 46:
    usia.append('dewasa_akhir')
if i >= 46 and i < 60 :
    usia.append('lansia_awal')
if i >= 60 and i < 75 :
    usia.append('lansia')
if i >= 75:
    usia.append('lansia_akhir')
column = ['kel_usia']
age = pd.DataFrame(usia, columns=column)
df_new = pd.concat([df_new, age], axis=1).reindex(df_new.index)

the result shown below

1        dewasa_akhir
4        dewasa_akhir
5         dewasa_awal
6         lansia_awal
7         lansia_awal
    ...     

32948             NaN
32949             NaN

Thank in advance

Mustafa Aydın · Accepted Answer

Actually you don't need to do these lines:

age = pd.DataFrame(usia, columns=column)
df_new = pd.concat([df_new, age], axis=1).reindex(df_new.index)

because first line makes a new frame with index 0..N-1; but then the concat in second line results in a union of indices i.e., that of df_new (which are different than 0..N-1) and 0..N-1, and NaNs appear. Then you also reindex with df_new's index but because of the mentioned difference in indexes, you have NaNs still...

So, a fix is to directly assign a column to df_new without above 2 lines. This won't change the df_new's indices at all and add a new column filled with usia's values:

df_new["kel_usia"] = usia

(by the way, your if-elif-else's can also be replaced with something like np.select or pd.cut but that's not the main focus of the question I believe...)

find NaN in creating new column based on iteration

Answers (2)

Related Questions