Reputation: 395
I try to create below dataframe that deliberately lacks some piece of information. That is, type
shall be empty for one record.
df = {'id': [1, 2, 3, 4, 5],
'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
'type': ['red', NaN, 'blue', 'blue', 'yellow']}
df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])
Works perfectly fine when I put all the values but I keep getting errors with NaN
, Null
, Na
, etc.
Any idea what I have to put?
Upvotes: 0
Views: 13395
Reputation: 3618
NaN
, Null
, Na
doesn't not represent an absence of value.
Use Python's None
Object to represent absence of value.
import pandas as pd
df = {'id': [1, 2, 3, 4, 5],
'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
'type': ['red', None, 'blue', 'blue', 'yellow']}
df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])
If you try to print the df, you'll get the following output:
id created_at type converted_tf
0 1 2020-02-01 red NaN
1 2 2020-02-02 None NaN
2 3 2020-02-02 blue NaN
3 4 2020-02-02 blue NaN
4 5 2020-02-03 yellow NaN
So, you may now think that NaN
and None
are different. Pandas uses NaN
as a placeholder for missing values, i.e instead of showing None it shows NaN
which is more readable. Read more about this here.
Now let's trying fillna function,
df.fillna('') # filling None or NaN values with empty string
You can see that both NaN
and None
got replaced by empty string.
id created_at type converted_tf
0 1 2020-02-01 red
1 2 2020-02-02
2 3 2020-02-02 blue
3 4 2020-02-02 blue
4 5 2020-02-03 yellow
Upvotes: 5
Reputation: 862481
Use np.NaN
if need missing value:
import numpy as np
import pandas as pd
df = {'id': [1, 2, 3, 4, 5],
'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
'type': ['red', np.NaN, 'blue', 'blue', 'yellow']}
Or float('NaN')
working too:
df = {'id': [1, 2, 3, 4, 5],
'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
'type': ['red', float('NaN'), 'blue', 'blue', 'yellow']}
df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])
print (df)
id created_at type converted_tf
0 1 2020-02-01 red NaN
1 2 2020-02-02 NaN NaN
2 3 2020-02-02 blue NaN
3 4 2020-02-02 blue NaN
4 5 2020-02-03 yellow NaN
Or use None
, it most time working same like np.NaN
if processing data in pandas:
df = {'id': [1, 2, 3, 4, 5],
'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
'type': ['red', None, 'blue', 'blue', 'yellow']}
df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])
print (df)
id created_at type converted_tf
0 1 2020-02-01 red NaN
1 2 2020-02-02 None NaN
2 3 2020-02-02 blue NaN
3 4 2020-02-02 blue NaN
4 5 2020-02-03 yellow NaN
Upvotes: 3