Reputation: 1190
I have the following dataframe:
df_data = pd.DataFrame({'Car':[1,2,3,4],'Renault':[1,0,0,0],'Volkswagen':[0,1,0,0],'Porsche':[0,0,1,0],'Mercedes':[1,1,0,0]})
That looks like:
Car Renault Volkswagen Porsche Mercedes
0 1 1 0 0 1
1 2 0 1 0 1
2 3 0 0 1 0
3 4 0 0 0 0
At some point I build a dictionary with some query-like strings:
getCountry= {}
getCountry['isGerman']= ['(`Volkswagen`== 1)','((`Porsche`== 1) or (`Mercedes` == 1))']
When running:
for cond,all_qs in getCountry.items():
df_data[cond]=0
for qs in all_qs:
df_data.loc[df_data.query(qs).index, cond] = 1
This trigger the correct results:
Car Renault Volkswagen Porsche Mercedes isGerman
0 1 1 0 0 1 1
1 2 0 1 0 1 1
2 3 0 0 1 0 1
3 4 0 0 0 0 0
What I need is an extra column letting me know why the isGerman column was 1, for example an extra column that would look like:
Car Renault Volkswagen Porsche Mercedes isGerman isGermanDetails
0 1 1 0 0 1 1 Mercedes
1 2 0 1 0 1 1 Volkswagen, Mercedes
2 3 0 0 1 0 1 Porsche
3 4 0 0 0 0 0
This new column should contain all the '1' column names from the different queries.
Thought of grouping all the columns for isGerman and ideally have a sql-like query:
SELECT COLUMN_NAME WHERE Volkswagen = 1 or Porsche = 1 or Mercedes = 1.
I need to know which columns have made the isGerman go to 1, even if it is set to 1 multiple times.
Upvotes: 1
Views: 62
Reputation: 71689
country = {'German': ['Volkswagen', 'Porsche', 'Mercedes']}
for k, v in country.items():
df_data[f'Is{k}'] = df_data[v].any(1).astype(int)
df_data[f'Is{k}Details'] = (df_data[v] @ (pd.Index(v) + ', ')).str[:-2]
print(df_data)
Car Renault Volkswagen Porsche Mercedes IsGerman IsGermanDetails
0 1 1 0 0 1 1 Mercedes
1 2 0 1 0 1 1 Volkswagen, Mercedes
2 3 0 0 1 0 1 Porsche
3 4 0 0 0 0 0
Upvotes: 3