How to verify if a variable is empty in python

I'm working with data in .csv format and want to set all the empty cells to the value of an empty string.

The problem that I'm facing is that those files have been manipulated for several people in different environments, hence there are various different junk values on these cells, such as:

' '
'NaN'
'nan'
'\n'
'   '

And so on.

I'm looking for a standard way to identify all of these types of "junk values."

Upvotes: 0

Views: 105

Answers (4)

Wenlong Liu
Wenlong Liu

Reputation: 444

I think pandas.replace would be a good alternative for your problem.

Following are some sample codes:

import pandas as pd
# sample data
dic = {'a':['NAN', "", "NaN"], 'b':["", "nan", '\n'], 'c':[1,'2','3']}
df = pd.DataFrame(dic)

replace_list = ['NaN', '', 'nan', '\n']
df_clean = df.replace(replace_list, '')
df_clean

You can import csv data to Pandas and do the same thing.

Hope it helps.

Upvotes: 0

nb1987
nb1987

Reputation: 1410

You can use the isspace function which would eliminate whitespace values like ' ' and '\n' but would not handle values like 'NaN' or 'nan'. There isn't really a standard way to deal with these, so in addition to using isspace I would also create a blacklist, e.g.:

blacklist = ['NaN', 'nan'] # add more as needed

Then use isspace() plus your blacklist to filter out unwanted values.

Upvotes: 2

Ned Batchelder
Ned Batchelder

Reputation: 375474

Use .strip() to remove whitespace, and then check if the value is one you want to ignore:

if value.strip() in ['', 'NaN', 'nan']:
    # ignore this value

Or, make it case-insensitive:

if value.strip().lower() in ['', 'nan']:
    # ignore this value

Upvotes: 4

user7019687
user7019687

Reputation:

You could read the csv into a Pandas DataFrame, and then use DataFrame.fillna().

Upvotes: 0

Related Questions