ShanZhengYang
ShanZhengYang

Reputation: 17631

How to assume a "default column" for a pandas dataframe?

I am creating a script whereby users input a *CSV file. This CSV file has several "required columns" (whereby if these columns do not exist, an error is thrown) and "default columns" (whereby if these columns are not provided, I assume these have a default value). I'm confused how to deal with the latter.

Here's a concrete example:

import pandas as pd

df = pd.read_csv("inputfile1.csv")
print(df)

    filename           category   type
0   records1.txt       3          A1
1   records2.txt       4          A1
2   records7.txt       5          A1
3   records8.txt       1          C4

This file has two required columns filename and category, and a default column type. If the user had input instead:

import pandas as pd

df = pd.read_csv("inputfile1b.csv")
print(df)

    filename           category  
0   records1.txt       3         
1   records2.txt       4         
2   records7.txt       5          
3   records8.txt       1        

I would assume that type is of value A1 for each row.

How would set these default values? One try would be to check whether the column exists; if not, somehow make these values A1

if 'type' not in df.columns:
    df.type = "A1" 

However, what do I do if certain rows do not have values? These should also be considered rows with default values A1

import pandas as pd

df = pd.read_csv("inputfile1c.csv")
print(df)

    filename           category   type
0   records1.txt       3                  ### this is A1
1   records2.txt       4          A1
2   records7.txt       5                  ### this is A1
3   records8.txt       1          C4

Upvotes: 1

Views: 3343

Answers (3)

zyun
zyun

Reputation: 69

you can regard them as missing values, try following:
df.type.fillna('A1', inplace = True)

Upvotes: -1

Alex Zisman
Alex Zisman

Reputation: 411

fillna will work

if 'type' not in df:
    df['type'] = "A1"
else:
    df['type'].fillna('A1', inplace=True)

Upvotes: 3

Abhishek Sharma
Abhishek Sharma

Reputation: 2039

You can make use of dictionary to do the same

# Create a default dictionary with column names and respective default values
default_dict = {'col1':1,'col2':2}

# Now read the input file
df = pd.read_csv("inputfile1b.csv")

# After this find list of columns missing in df
missing_cols = list(set(df_default.columns) - set(df.columns))

# Add the missing columns with default values

for i in missing_cols:
    df[i] = default_dict[i]

Upvotes: 1

Related Questions