Reputation: 178
Suppose I have a natural-valued variable, e.g. "age" in my csv-Dataset. This dataset is flowed, since some of the values are strings, e.g. "missing".
This code
personal_info = pd.read_csv("Age.csv", sep=',')
gives me the error
DtypeWarning: Columns (6,10) have mixed types. Specify dtype option on import or set low_memory=False.
Adding dtype
personal_info = pd.read_csv("Age.csv", sep=',', error_bad_lines=False,
dtype={'age': int})
blows up when encountering the string "missing".
invalid literal for int() with base 10: 'missing'
How do I ignore the rows with the values not in the variable domain?
Upvotes: 1
Views: 258
Reputation: 4284
You can use na_values
argument :
personal_info = pd.read_csv("Age.csv", sep=',', error_bad_lines=False,
dtype={'age': int},na_values=['missing'])
Upvotes: 2