LeroyFromBerlin
LeroyFromBerlin

Reputation: 395

How to create dataframe in pandas that contains Null values

I try to create below dataframe that deliberately lacks some piece of information. That is, type shall be empty for one record.

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', NaN, 'blue', 'blue', 'yellow']}

df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])

Works perfectly fine when I put all the values but I keep getting errors with NaN, Null, Na, etc.

Any idea what I have to put?

Upvotes: 0

Views: 13395

Answers (2)

CodeIt
CodeIt

Reputation: 3618

NaN, Null, Na doesn't not represent an absence of value.


Use Python's None Object to represent absence of value.

import pandas as pd

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', None, 'blue', 'blue', 'yellow']}

df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])

If you try to print the df, you'll get the following output:

   id  created_at    type converted_tf
0   1  2020-02-01     red          NaN
1   2  2020-02-02    None          NaN
2   3  2020-02-02    blue          NaN
3   4  2020-02-02    blue          NaN
4   5  2020-02-03  yellow          NaN

So, you may now think that NaN and None are different. Pandas uses NaN as a placeholder for missing values, i.e instead of showing None it shows NaN which is more readable. Read more about this here.

Now let's trying fillna function,

df.fillna('')  # filling None or NaN values with empty string

You can see that both NaN and None got replaced by empty string.

   id  created_at    type converted_tf
0   1  2020-02-01     red
1   2  2020-02-02
2   3  2020-02-02    blue
3   4  2020-02-02    blue
4   5  2020-02-03  yellow

Upvotes: 5

jezrael
jezrael

Reputation: 862481

Use np.NaN if need missing value:

import numpy as np
import pandas as pd

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', np.NaN, 'blue', 'blue', 'yellow']}

Or float('NaN') working too:

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', float('NaN'), 'blue', 'blue', 'yellow']}

df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])
print (df)
   id  created_at    type converted_tf
0   1  2020-02-01     red          NaN
1   2  2020-02-02     NaN          NaN
2   3  2020-02-02    blue          NaN
3   4  2020-02-02    blue          NaN
4   5  2020-02-03  yellow          NaN

Or use None, it most time working same like np.NaN if processing data in pandas:

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', None, 'blue', 'blue', 'yellow']}

df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])
print (df)
   id  created_at    type converted_tf
0   1  2020-02-01     red          NaN
1   2  2020-02-02    None          NaN
2   3  2020-02-02    blue          NaN
3   4  2020-02-02    blue          NaN
4   5  2020-02-03  yellow          NaN

Upvotes: 3

Related Questions