find NaN in creating new column based on iteration

i dont know why i find NaN when i create a new column, here the code i use for

df_new['age'] = df_new['age'].astype('Int64')
usia = []
for i in df_new['age']:
if i < 17:
     usia.append('under_age')
if i >= 17 and i < 31:
    usia.append('dewasa_awal')
if i >= 31 and i < 46:
    usia.append('dewasa_akhir')
if i >= 46 and i < 60 :
    usia.append('lansia_awal')
if i >= 60 and i < 75 :
    usia.append('lansia')
if i >= 75:
    usia.append('lansia_akhir')
column = ['kel_usia']
age = pd.DataFrame(usia, columns=column)
df_new = pd.concat([df_new, age], axis=1).reindex(df_new.index)

the result shown below

1        dewasa_akhir
4        dewasa_akhir
5         dewasa_awal
6         lansia_awal
7         lansia_awal
    ...     

32948             NaN
32949             NaN

Thank in advance

Upvotes: 1

Views: 39

Answers (2)

Corralien
Corralien

Reputation: 120489

To avoid this error with NaN and simplify your code, you can use pd.cut:

Input data:

>>> df_new
   age
0   93
1   26
2   16
3   34
4   58
5   76
6   68
7   14
8   77
9   84
df_new['kel_usia'] = pd.cut(df['age'], bins=[-1, 17, 31, 46, 60, 75, 9999],
                            labels=['under_age', 'dewasa_awal', 'dewasa_akhir',
                                    'lansia_awal', 'lansia', 'lansia_akhir'])

Output result:

   age      kel_usia
0   93  lansia_akhir
1   26   dewasa_awal
2   16     under_age
3   34  dewasa_akhir
4   58   lansia_awal
5   76  lansia_akhir
6   68        lansia
7   14     under_age
8   77  lansia_akhir
9   84  lansia_akhir

Upvotes: 0

Mustafa Aydın
Mustafa Aydın

Reputation: 18315

Actually you don't need to do these lines:

age = pd.DataFrame(usia, columns=column)
df_new = pd.concat([df_new, age], axis=1).reindex(df_new.index)

because first line makes a new frame with index 0..N-1; but then the concat in second line results in a union of indices i.e., that of df_new (which are different than 0..N-1) and 0..N-1, and NaNs appear. Then you also reindex with df_new's index but because of the mentioned difference in indexes, you have NaNs still...

So, a fix is to directly assign a column to df_new without above 2 lines. This won't change the df_new's indices at all and add a new column filled with usia's values:

df_new["kel_usia"] = usia

(by the way, your if-elif-else's can also be replaced with something like np.select or pd.cut but that's not the main focus of the question I believe...)

Upvotes: 1

Related Questions