Reputation: 5660
My question is related to this one. I have a file named 'test.csv' with 'NA' as a value for region
. I want to read this in as 'NA', not 'NaN'. However, there are missing values in other columns in test.csv, which I want to retain as 'NaN'. How can I do this?
# test.csv looks like this:
Here's what I've tried:
import pandas as pd
# This reads NA as NaN
df = pd.read_csv(test.csv)
df
region date expenses
0 NaN 1/1/2019 53
1 EU 1/2/2019 NaN
# This reads NA as NA, but doesn't read missing expense as NaN
df = pd.read_csv('test.csv', keep_default_na=False, na_values='_')
df
region date expenses
0 NA 1/1/2019 53
1 EU 1/2/2019
# What I want:
region date expenses
0 NA 1/1/2019 53
1 EU 1/2/2019 NaN
The problem with adding the argument keep_default_na=False
is that the second value for expenses
does not get read in as NaN
. So if I then try pd.isnull(df['value'][1])
this is returned as False
.
Upvotes: 6
Views: 5648
Reputation: 301
This approach work for me:
import pandas as pd
df = pd.read_csv('Test.csv')
co1 col2 col3 col4
a b c d e
NaN NaN NaN NaN NaN
2 3 4 5 NaN
I copied the value and created a list which are by default interpreted as NaN then comment out NA which I wanted to be interpreted as not NaN. This approach still treat other values as NaN except for NA.
#You can also create your own list of value that should be treated as NaN and
# then pass the values to na_values and set keep_default_na=False.
na_values = ["",
"#N/A",
"#N/A N/A",
"#NA",
"-1.#IND",
"-1.#QNAN",
"-NaN",
"-nan",
"1.#IND",
"1.#QNAN",
"<NA>",
"N/A",
# "NA",
"NULL",
"NaN",
"n/a",
"nan",
"null"]
df1 = pd.read_csv('Test.csv',na_values=na_values,keep_default_na=False )
co1 col2 col3 col4
a b c d e
NaN NA NaN NA NaN
2 3 4 5 NaN
Upvotes: 0
Reputation: 150735
For me, this works:
df = pd.read_csv('file.csv', keep_default_na=False, na_values=[''])
which gives:
region date expenses
0 NA 1/1/2019 53.0
1 EU 1/2/2019 NaN
But I'd rather play safe, due to possible other NaN
in other columns, and do
df = pd.read_csv('file.csv')
df['region'] = df['region'].fillna('NA')
Upvotes: 5
Reputation: 743
when specifying keep_default=False
all defaults values are not considered as nan so you should specify them:
use keep_default_na=False, na_values= [‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’]
Upvotes: 0