Developer 2023
Developer 2023

Reputation: 43

I have "NA" in a column (meaning North America) that shows blank after CSV is read by Pandas

I have a CSV file with a Country column containing country codes. In this, "NA" means North America. I need to convert it to UTF-8 format. When using the code below the rows with "NA" show as blank in the exported file:

df = pd.read_csv(filepath, encoding='UTF-8')
df.to_csv(r'path+filename',header=None ,encoding = 'UTF-8', index = False)

For example,

Input file:

Week Country PL Sales$
W01   AE     0I  250
W02   NA     0I  130

Output file:

Week Country PL Sales$
W01   AE     0I  250
W02          0I  130

I have tried to fill other columns in the source file with "NA" that are also now blank.

Upvotes: 3

Views: 570

Answers (1)

Giorgos Myrianthous
Giorgos Myrianthous

Reputation: 39810

'NA' is among the default NaN values in na_values. You need to instruct pandas to exclude the default values when reading in the csv file using pd.read_csv():

keep_default_na: bool, default True

Whether or not to include the default NaN values when parsing the data. Depending on whether na_values is passed in, the behavior is as follows:

If keep_default_na is True, and na_values are specified, na_values is appended to the default NaN values used for parsing.

If keep_default_na is True, and na_values are not specified, only the default NaN values are used for parsing.

If keep_default_na is False, and na_values are specified, only the NaN values specified na_values are used for parsing.

If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN.

Note that if na_filter is passed in as False, the keep_default_na and na_values parameters will be ignored.


This should do the trick:

df = pd.read_csv(filepath, encoding='UTF-8', keep_default_na=False)

And depending on what other operations you want to perform, you might also need to define na_values accordingly.

Upvotes: 4

Related Questions