codeninja
codeninja

Reputation: 379

Loop over dataframe columns to check against dictionary keys

Hi all so I have a dataframe:

df1
   WM          WH         WP           LC_REF
0 Tesla     Doritos     Spiders        DT 17 1C
1 Merc      Lays        Contortion     DT 17 1C
2 Lambo     Finale        NaN          DT 17 1C
3 Reddy       Red         NaN          DT 17 1C
4 Tomball     Fools       NaN          DT 17 1C

and a dictionary

example = {'Fools':'Car','Red':'Car','Merc':'Car','Tesla':'Car','Doritos':'Food','Spiders':'Food','Reddy':'Food','Tomball':'Food'}

I wanted to know the most efficient way to check each item in every column, and then group the columns, such that if the column value matches to the value 'Car' or 'Food' then the desired output is:

df2 if key matches Car

  LC_REF    vals        Category
0 DT 17 1C  Merc,Tesla    WM
1 DT 17 1C  Red, Fools    WH


df3 if key matches Food

  LC_REF    vals                 Category
0 DT 17 1C  Reddy,Tomball          WM
1 DT 17 1C  Doritos, Lays          WH
2 DT 17 1C  Spiders, Contortion    WP

So far

df = df.groupby('LC_REF',sort=False).agg(lambda x: ','.join(x.astype(str).str.upper()).replace(' ','')).stack().rename_axis(('LC_REF','a')).reset_index(name='vals')

is my code to group them but I am struggling with how to correctly separate them. Any help with this complex problem would be great.

Upvotes: 1

Views: 834

Answers (1)

Abdou
Abdou

Reputation: 13274

Try the following:

import pandas as pd
import io


example = {'Fools':'Car','Red':'Car','Merc':'Car',
           'Tesla':'Car','Doritos':'Food',
           'Spiders':'Food','Reddy':'Food',
           'Tomball':'Food', 'Lays':'Food', 'Contortion': 'Food'}

# Flip the example dictionary around (keys become values and values become keys)
value_dict = {}
for k, v in example.items():
    value_dict.setdefault(v, [])
    value_dict[v].append(k)

text = u"""WM          WH         WP           LC_REF
Tesla     Doritos     Spiders       DT 17 1C
Merc      Lays        Contortion    DT 17 1C
Lambo     Finale        NaN         DT 17 1C
Reddy       Red         NaN         DT 17 1C
Tomball     Fools       NaN         DT 17 1C"""

df1 = pd.read_table(io.StringIO(text), header=0, sep="\s{2,}", engine='python')
# Melt the dataframe first
df2 = pd.melt(df1, id_vars=['LC_REF'], value_name='vals', var_name='Category')

# Get your dataframe for Food
food_df = df2.loc[df2.vals.isin(value_dict.get('Food'))].groupby(['LC_REF', 'Category']).apply(lambda x: ', '.join(x.vals)).reset_index()
print(food_df)

#      LC_REF Category                    0
# 0  DT 17 1C       WH        Doritos, Lays
# 1  DT 17 1C       WM       Reddy, Tomball
# 2  DT 17 1C       WP  Spiders, Contortion

# Get your dataframe for Car
car_df = df2.loc[df2.vals.isin(value_dict.get('Car'))].groupby(['LC_REF', 'Category']).apply(lambda x: ', '.join(x.vals)).reset_index()
print(car_df)

#      LC_REF Category            0
# 0  DT 17 1C       WH   Red, Fools
# 1  DT 17 1C       WM  Tesla, Merc

Some guiding steps:

  1. Flip around the example dictionary: keys to a list of values and values to keys. Doing this helps with searching for items. You put all the items related Food in one list and the items related to Car in another list. This way, if you want to check if a column contains Food items, all you have to do is use the .isin method of a pandas Series object and supply the list of Food items. Analogously, you'd follow the same procedure when looking for Car items.

  2. pd.melt the starting dataframe (df1). This puts the dataframe in the shape (from wide to long) that is conducive for the subsequent operations.

  3. Use boolean indexing to get rows whose vals column contains either Food or Car items.

  4. Use a group-by operation on columns LC_REF and Category and then join the items in the vals column.

I hope this helps.

Upvotes: 1

Related Questions