Reputation: 1048
I have a function which takes in a dataframe, does some transformations and returns the numeric and categorical column names as a list.
cat_cols, num_cols = Data_Type_And_Transformation(df_data_sample, 'MEAN')
cat_cols =
['var1_m2_Transform',
'var2_m2_Transform',
'var2_m3_Transform',
'var3_m3_Transform',
'var5_m3_Transform',
'var8_m3_Transform',
'var9_m3_Transform']
num_cols =
['ttl_change_3m',
'ttl_change_6m',
'base_rev_3m',
'csc_ttl_6m']
Then I am trying to create a dictionary whose keys will be the column names and values will be the data type - NUM or CAT as below:
attribute_df_benford_cat = pd.DataFrame()
attribute_df_benford_num = pd.DataFrame()
attribute_df_cat['Attribute'] = cat_cols
attribute_df_cat['Type'] = 'CAT'
attribute_df_num['Attribute'] = num_cols
attribute_df_num['Type'] = 'NUM'
attribute_df = attribute_df_cat.append(attribute_df_num)
attribute_df.set_index('Attribute',inplace = True)
attribute_dict = OrderedDict(attribute_df.to_dict('index'))
But this gives me a dict of the form:
Key Type Size Value
ttl_change_3m dict 1 {'Type': 'NUM'}
ttl_change_6m dict 1 {'Type': 'NUM'}
base_rev_3m dict 1 {'Type': 'NUM'}
csc_ttl_6m dict 1 {'Type': 'NUM'}
var1_m2_Transform dict 1 {'Type': 'CAT'}
var2_m2_Transform dict 1 {'Type': 'CAT'}
var2_m3_Transform dict 1 {'Type': 'CAT'}
var3_m3_Transform dict 1 {'Type': 'CAT'}
var5_m3_Transform dict 1 {'Type': 'CAT'}
var9_m3_Transform dict 1 {'Type': 'CAT'}
var8_m3_Transform dict 1 {'Type': 'CAT'}
Whereas I want it in the below format:
Key Type Size Value
ttl_change_3m str 1 NUM
ttl_change_6m str 1 NUM
base_rev_3m str 1 NUM
csc_ttl_6m str 1 NUM
var1_m2_Transform str 1 CAT
var2_m2_Transform str 1 CAT
var2_m3_Transform str 1 CAT
var3_m3_Transform str 1 CAT
var5_m3_Transform str 1 CAT
var9_m3_Transform str 1 CAT
var8_m3_Transform str 1 CAT
Also , I think I am doing too many steps to get to the result and there might be shorter/efficient version of code to do this.
Can someone please help me with this?
Upvotes: 3
Views: 62
Reputation: 6159
I think you need np.where,
import numpy as np
import pandas as pd
df=pd.DataFrame({'Key':pd.Series(num_cols+cat_cols)})
df['Value']=np.where(df['Key'].isin(cat_cols), 'CAT','NUM')
#print(df)
Key Value
# ttl_change_3m NUM
# ttl_change_6m NUM
# base_rev_3m NUM
# csc_ttl_6m NUM
# var1_m2_Transform CAT
# var2_m2_Transform CAT
# var2_m3_Transform CAT
# var3_m3_Transform CAT
# var5_m3_Transform CAT
# var8_m3_Transform CAT
# var9_m3_Transform CAT
Upvotes: 1