Reputation: 10383
I'm working with data in .csv format and want to set all the empty cells to the value of an empty string.
The problem that I'm facing is that those files have been manipulated for several people in different environments, hence there are various different junk values on these cells, such as:
' '
'NaN'
'nan'
'\n'
' '
And so on.
I'm looking for a standard way to identify all of these types of "junk values."
Upvotes: 0
Views: 105
Reputation: 444
I think pandas.replace would be a good alternative for your problem.
Following are some sample codes:
import pandas as pd
# sample data
dic = {'a':['NAN', "", "NaN"], 'b':["", "nan", '\n'], 'c':[1,'2','3']}
df = pd.DataFrame(dic)
replace_list = ['NaN', '', 'nan', '\n']
df_clean = df.replace(replace_list, '')
df_clean
You can import csv data to Pandas and do the same thing.
Hope it helps.
Upvotes: 0
Reputation: 1410
You can use the isspace
function which would eliminate whitespace values like ' '
and '\n'
but would not handle values like 'NaN'
or 'nan'
. There isn't really a standard way to deal with these, so in addition to using isspace
I would also create a blacklist, e.g.:
blacklist = ['NaN', 'nan'] # add more as needed
Then use isspace()
plus your blacklist
to filter out unwanted values.
Upvotes: 2
Reputation: 375474
Use .strip() to remove whitespace, and then check if the value is one you want to ignore:
if value.strip() in ['', 'NaN', 'nan']:
# ignore this value
Or, make it case-insensitive:
if value.strip().lower() in ['', 'nan']:
# ignore this value
Upvotes: 4
Reputation:
You could read the csv into a Pandas DataFrame, and then use DataFrame.fillna()
.
Upvotes: 0