Reputation: 75
i dont know why i find NaN when i create a new column, here the code i use for
df_new['age'] = df_new['age'].astype('Int64')
usia = []
for i in df_new['age']:
if i < 17:
usia.append('under_age')
if i >= 17 and i < 31:
usia.append('dewasa_awal')
if i >= 31 and i < 46:
usia.append('dewasa_akhir')
if i >= 46 and i < 60 :
usia.append('lansia_awal')
if i >= 60 and i < 75 :
usia.append('lansia')
if i >= 75:
usia.append('lansia_akhir')
column = ['kel_usia']
age = pd.DataFrame(usia, columns=column)
df_new = pd.concat([df_new, age], axis=1).reindex(df_new.index)
the result shown below
1 dewasa_akhir
4 dewasa_akhir
5 dewasa_awal
6 lansia_awal
7 lansia_awal
...
32948 NaN
32949 NaN
Thank in advance
Upvotes: 1
Views: 39
Reputation: 120489
To avoid this error with NaN
and simplify your code, you can use pd.cut
:
Input data:
>>> df_new
age
0 93
1 26
2 16
3 34
4 58
5 76
6 68
7 14
8 77
9 84
df_new['kel_usia'] = pd.cut(df['age'], bins=[-1, 17, 31, 46, 60, 75, 9999],
labels=['under_age', 'dewasa_awal', 'dewasa_akhir',
'lansia_awal', 'lansia', 'lansia_akhir'])
Output result:
age kel_usia
0 93 lansia_akhir
1 26 dewasa_awal
2 16 under_age
3 34 dewasa_akhir
4 58 lansia_awal
5 76 lansia_akhir
6 68 lansia
7 14 under_age
8 77 lansia_akhir
9 84 lansia_akhir
Upvotes: 0
Reputation: 18315
Actually you don't need to do these lines:
age = pd.DataFrame(usia, columns=column)
df_new = pd.concat([df_new, age], axis=1).reindex(df_new.index)
because first line makes a new frame with index 0..N-1
; but then the concat
in second line results in a union of indices i.e., that of df_new
(which are different than 0..N-1
) and 0..N-1
, and NaN
s appear. Then you also reindex
with df_new
's index but because of the mentioned difference in indexes, you have NaN
s still...
So, a fix is to directly assign a column to df_new
without above 2 lines. This won't change the df_new
's indices at all and add a new column filled with usia
's values:
df_new["kel_usia"] = usia
(by the way, your if-elif-else
's can also be replaced with something like np.select
or pd.cut
but that's not the main focus of the question I believe...)
Upvotes: 1