Reputation: 43
I have a CSV file with a Country
column containing country codes. In this, "NA" means North America. I need to convert it to UTF-8 format. When using the code below the rows with "NA" show as blank in the exported file:
df = pd.read_csv(filepath, encoding='UTF-8')
df.to_csv(r'path+filename',header=None ,encoding = 'UTF-8', index = False)
For example,
Input file:
Week Country PL Sales$
W01 AE 0I 250
W02 NA 0I 130
Output file:
Week Country PL Sales$
W01 AE 0I 250
W02 0I 130
I have tried to fill other columns in the source file with "NA" that are also now blank.
Upvotes: 3
Views: 570
Reputation: 39810
'NA'
is among the default NaN
values in na_values
. You need to instruct pandas to exclude the default values when reading in the csv file using pd.read_csv()
:
keep_default_na: bool, default True
Whether or not to include the default
NaN
values when parsing the data. Depending on whetherna_values
is passed in, the behavior is as follows:If
keep_default_na
isTrue
, andna_values
are specified,na_values
is appended to the defaultNaN
values used for parsing.If
keep_default_na
isTrue
, andna_values
are not specified, only the defaultNaN
values are used for parsing.If
keep_default_na
isFalse
, andna_values
are specified, only theNaN
values specifiedna_values
are used for parsing.If
keep_default_na
isFalse
, andna_values
are not specified, no strings will be parsed asNaN
.Note that if
na_filter
is passed in asFalse
, thekeep_default_na
andna_values
parameters will be ignored.
This should do the trick:
df = pd.read_csv(filepath, encoding='UTF-8', keep_default_na=False)
And depending on what other operations you want to perform, you might also need to define na_values
accordingly.
Upvotes: 4