Reputation: 2871
I have a data frame as shown below.
Unit_ID Type Sector Plot_Number Rental
1 Home se1 22 50
2 Shop se1 26 80
From the above I need write function to check whether the list of columns as shown below is in the data frame.
if list is ['Unit_ID', 'Sector', 'Usage_Type', 'Price' ]
Expected Output: column 'Usage_Type' and 'Price' is/are not in the dataframe.
if list is ['Unit_ID', 'Sector' , 'Type', 'Plot_Number' ]
Exepected Output: All coulmns in the list are in the dataframe
Upvotes: 2
Views: 7528
Reputation: 4584
main_df = pd.DataFrame(data={'A':[1],'C':[2],'D':[3]})
print(main_df)
check_cols_list = ['B','C']
check_cols_df = pd.DataFrame(columns=check_cols_list)
print("Names of the check_cols_list present in the main_df columns are:")
print(check_cols_df.columns[check_cols_df.columns.isin(main_df.columns)])
print("Names of the check_cols_list not present in the main_df columns are:")
print(check_cols_df.columns[~check_cols_df.columns.isin(main_df.columns)])
Present output:
A C D
0 1 2 3
Names of the check_cols_list present in the main_df columns are:
Index(['C'], dtype='object')
Names of the check_cols_list not present in the main_df columns are:
Index(['B'], dtype='object')
Upvotes: 0
Reputation: 1445
Why not simply
def has_columns(cols: List[str], df:pd.DataFrame) -> bool:
try:
columns = df[cols]
except KeyError as e:
print(f'Missing columns: {e}')
return False
print(f'All columns {cols} in dataframe!')
return True
Upvotes: 0
Reputation: 75100
You can try using below:
#For checking if the list of columns are actually
#a subset of the dataframe columns or not , you can use:
def myf1(x,to_check):
if not set(to_check).issubset(set(x.columns)):
return f"{' and '.join(set(to_check).difference(x.columns))} are not available in the dataframe"
return "All columns are available in the dataframe"
to_check = ['Unit_ID', 'Sector']
myf1(df,to_check)
#'All columns are available in the dataframe'
to_check = ['Unit_ID', 'Sector','XYZ']
myf1(df,to_check)
#'XYZ are not available in the dataframe'
Upvotes: 4
Reputation: 21
The list of columns names can be found by:
columns = list(my_dataframe)
Now you can iterate through your search list and check if each element is present in the columns
list.
def search_func(to_check, columns):
not_present = []
for i in to_check:
if i not in columns:
not_present.append(i)
return not_present
to_check = ['Unit_ID', 'Sector', 'Usage_Type', 'Price' ]
not_present = search_func(to_check, columns)
if len(not_present) == 0:
print(" All coulmns are in the dataframe")
else:
print (not_present, "not present in dataframe")
Upvotes: 1