Reputation: 379
Hi all so I have a dataframe:
df1
WM WH WP LC_REF
0 Tesla Doritos Spiders DT 17 1C
1 Merc Lays Contortion DT 17 1C
2 Lambo Finale NaN DT 17 1C
3 Reddy Red NaN DT 17 1C
4 Tomball Fools NaN DT 17 1C
and a dictionary
example = {'Fools':'Car','Red':'Car','Merc':'Car','Tesla':'Car','Doritos':'Food','Spiders':'Food','Reddy':'Food','Tomball':'Food'}
I wanted to know the most efficient way to check each item in every column, and then group the columns, such that if the column value matches to the value 'Car' or 'Food' then the desired output is:
df2 if key matches Car
LC_REF vals Category
0 DT 17 1C Merc,Tesla WM
1 DT 17 1C Red, Fools WH
df3 if key matches Food
LC_REF vals Category
0 DT 17 1C Reddy,Tomball WM
1 DT 17 1C Doritos, Lays WH
2 DT 17 1C Spiders, Contortion WP
So far
df = df.groupby('LC_REF',sort=False).agg(lambda x: ','.join(x.astype(str).str.upper()).replace(' ','')).stack().rename_axis(('LC_REF','a')).reset_index(name='vals')
is my code to group them but I am struggling with how to correctly separate them. Any help with this complex problem would be great.
Upvotes: 1
Views: 834
Reputation: 13274
Try the following:
import pandas as pd
import io
example = {'Fools':'Car','Red':'Car','Merc':'Car',
'Tesla':'Car','Doritos':'Food',
'Spiders':'Food','Reddy':'Food',
'Tomball':'Food', 'Lays':'Food', 'Contortion': 'Food'}
# Flip the example dictionary around (keys become values and values become keys)
value_dict = {}
for k, v in example.items():
value_dict.setdefault(v, [])
value_dict[v].append(k)
text = u"""WM WH WP LC_REF
Tesla Doritos Spiders DT 17 1C
Merc Lays Contortion DT 17 1C
Lambo Finale NaN DT 17 1C
Reddy Red NaN DT 17 1C
Tomball Fools NaN DT 17 1C"""
df1 = pd.read_table(io.StringIO(text), header=0, sep="\s{2,}", engine='python')
# Melt the dataframe first
df2 = pd.melt(df1, id_vars=['LC_REF'], value_name='vals', var_name='Category')
# Get your dataframe for Food
food_df = df2.loc[df2.vals.isin(value_dict.get('Food'))].groupby(['LC_REF', 'Category']).apply(lambda x: ', '.join(x.vals)).reset_index()
print(food_df)
# LC_REF Category 0
# 0 DT 17 1C WH Doritos, Lays
# 1 DT 17 1C WM Reddy, Tomball
# 2 DT 17 1C WP Spiders, Contortion
# Get your dataframe for Car
car_df = df2.loc[df2.vals.isin(value_dict.get('Car'))].groupby(['LC_REF', 'Category']).apply(lambda x: ', '.join(x.vals)).reset_index()
print(car_df)
# LC_REF Category 0
# 0 DT 17 1C WH Red, Fools
# 1 DT 17 1C WM Tesla, Merc
Some guiding steps:
Flip around the example
dictionary: keys to a list of values and values to keys. Doing this helps with searching for items. You put all the items related Food
in one list and the items related to Car
in another list. This way, if you want to check if a column contains Food
items, all you have to do is use the .isin
method of a pandas Series
object and supply the list of Food
items. Analogously, you'd follow the same procedure when looking for Car
items.
pd.melt
the starting dataframe (df1
). This puts the dataframe in the shape (from wide to long) that is conducive for the subsequent operations.
Use boolean indexing to get rows whose vals
column contains either Food
or Car
items.
Use a group-by
operation on columns LC_REF
and Category
and then join the items in the vals
column.
I hope this helps.
Upvotes: 1