Reputation: 17631
I am creating a script whereby users input a *CSV file. This CSV file has several "required columns" (whereby if these columns do not exist, an error is thrown) and "default columns" (whereby if these columns are not provided, I assume these have a default value). I'm confused how to deal with the latter.
Here's a concrete example:
import pandas as pd
df = pd.read_csv("inputfile1.csv")
print(df)
filename category type
0 records1.txt 3 A1
1 records2.txt 4 A1
2 records7.txt 5 A1
3 records8.txt 1 C4
This file has two required columns filename
and category
, and a default column type
. If the user had input instead:
import pandas as pd
df = pd.read_csv("inputfile1b.csv")
print(df)
filename category
0 records1.txt 3
1 records2.txt 4
2 records7.txt 5
3 records8.txt 1
I would assume that type
is of value A1
for each row.
How would set these default values? One try would be to check whether the column exists; if not, somehow make these values A1
if 'type' not in df.columns:
df.type = "A1"
However, what do I do if certain rows do not have values? These should also be considered rows with default values A1
import pandas as pd
df = pd.read_csv("inputfile1c.csv")
print(df)
filename category type
0 records1.txt 3 ### this is A1
1 records2.txt 4 A1
2 records7.txt 5 ### this is A1
3 records8.txt 1 C4
Upvotes: 1
Views: 3343
Reputation: 69
you can regard them as missing values, try following:
df.type.fillna('A1', inplace = True)
Upvotes: -1
Reputation: 411
fillna
will work
if 'type' not in df:
df['type'] = "A1"
else:
df['type'].fillna('A1', inplace=True)
Upvotes: 3
Reputation: 2039
You can make use of dictionary to do the same
# Create a default dictionary with column names and respective default values
default_dict = {'col1':1,'col2':2}
# Now read the input file
df = pd.read_csv("inputfile1b.csv")
# After this find list of columns missing in df
missing_cols = list(set(df_default.columns) - set(df.columns))
# Add the missing columns with default values
for i in missing_cols:
df[i] = default_dict[i]
Upvotes: 1