Reputation: 111
I am reading a csv.gz file from S3 having a string column with empty values. Once I read that file using pandas.read_csv() method ,
pandas.read_csv(io.BytesIO(csv_data['Body'].read()), sep='|',compression='gzip',
engine='python', error_bad_lines=False, warn_bad_lines=True,
encoding='iso-8859-1',
escapechar='\\',
quoting=1)
I am getting NaN values in dataframe instead of empty/blank in string column.Couple of questions?
i) Do NaN applies to where type is object?
ii) Do NaN only applied to Numbers (integers, floats) and not to strings
Any help would be appreciated. Thanks. Below is the input and actual output I am getting.
Input:
"Obj_ID"|"Value"|"TimeStamp"\n
"ID-1"|"val"| "2020-03-12 00:00:00"
"ID-2"|"v"| "2020-03-12 00:00:00"
"ID-3"|"value-3"| "2020-03-12 00:00:00"
"ID-4"|"value-4"| "2020-03-12 00:00:00"
"ID-5"|""| "2020-03-12 00:00:00"
Actual Output:
Obj_ID Value TimeStamp
0 ID-1 val "2020-03-12 00:00:00"
1 ID-2 v "2020-03-12 00:00:00"
2 ID-3 value-3 "2020-03-12 00:00:00"
3 ID-4 value-4 "2020-03-12 00:00:00"
4 ID-5 NaN "2020-03-12 00:00:00"
Desired output without manipulation of Dataframe should be :
Obj_ID Value TimeStamp
0 ID-1 val "2020-03-12 00:00:00"
1 ID-2 v "2020-03-12 00:00:00"
2 ID-3 value-3 "2020-03-12 00:00:00"
3 ID-4 value-4 "2020-03-12 00:00:00"
4 ID-5 '' "2020-03-12 00:00:00"
Upvotes: 0
Views: 107
Reputation: 2803
From pandas documentation on read_csv
:
na_values : scalar, str, list-like, or dict, optional
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, [...]
This explains why the empty string is interpreted as NaN
.
keep_default_na : bool, default True
Whether or not to include the default NaN values when parsing the data. Depending on whether na_values is passed in, the behavior is as follows: [...]
If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN.
So just adding keep_default_na=False
as a parameter to read_csv
should do what you need.
Upvotes: 1