Danish
Danish

Reputation: 2871

Check whether a list of columns in a dataframe pandas

I have a data frame as shown below.

Unit_ID     Type      Sector       Plot_Number       Rental
1           Home      se1          22                50
2           Shop      se1          26                80

From the above I need write function to check whether the list of columns as shown below is in the data frame.

if list is ['Unit_ID', 'Sector', 'Usage_Type', 'Price' ]

Expected Output: column 'Usage_Type' and 'Price' is/are not in the dataframe.

if list is ['Unit_ID', 'Sector' , 'Type', 'Plot_Number' ]

Exepected Output: All coulmns in the list are in the dataframe

Upvotes: 2

Views: 7528

Answers (4)

Mainland
Mainland

Reputation: 4584

main_df = pd.DataFrame(data={'A':[1],'C':[2],'D':[3]})
print(main_df)
check_cols_list = ['B','C']
check_cols_df = pd.DataFrame(columns=check_cols_list)
print("Names of the check_cols_list present in the main_df columns are:")
print(check_cols_df.columns[check_cols_df.columns.isin(main_df.columns)])
print("Names of the check_cols_list not present in the main_df columns are:")
print(check_cols_df.columns[~check_cols_df.columns.isin(main_df.columns)])

Present output:

   A  C  D
0  1  2  3
Names of the check_cols_list present in the main_df columns are:
Index(['C'], dtype='object')
Names of the check_cols_list not present in the main_df columns are:
Index(['B'], dtype='object')

Upvotes: 0

SystemSigma_
SystemSigma_

Reputation: 1445

Why not simply

def has_columns(cols: List[str], df:pd.DataFrame) -> bool:
    try:
        columns = df[cols]
    except KeyError as e:
        print(f'Missing columns: {e}')
        return False
    print(f'All columns {cols} in dataframe!')
    return True

Upvotes: 0

anky
anky

Reputation: 75100

You can try using below:

#For checking if the list of columns are actually 
#a subset of the dataframe columns or not , you can use:

def myf1(x,to_check):
    if not set(to_check).issubset(set(x.columns)):
       return f"{' and '.join(set(to_check).difference(x.columns))} are not available in the dataframe"
    return "All columns are available in the dataframe"

to_check = ['Unit_ID', 'Sector'] 
myf1(df,to_check)
#'All columns are available in the dataframe'

to_check = ['Unit_ID', 'Sector','XYZ'] 
myf1(df,to_check)    
#'XYZ are not available in the dataframe'

Upvotes: 4

Pralay Ramteke
Pralay Ramteke

Reputation: 21

The list of columns names can be found by:

columns = list(my_dataframe)

Now you can iterate through your search list and check if each element is present in the columns list.

def search_func(to_check, columns):
    not_present = []

    for i in to_check:
        if i not in columns:
            not_present.append(i)
    return not_present

to_check = ['Unit_ID', 'Sector',  'Usage_Type', 'Price' ]
not_present = search_func(to_check, columns)
if len(not_present) == 0:
    print(" All coulmns are in the dataframe")
else: 
    print (not_present, "not present in dataframe")

Upvotes: 1

Related Questions