user3926863
user3926863

Reputation: 333

Filter Pandas Dataframe based on multiple columns with multiple values per single column

hope some one can help me.

Given one value per column, I succeeded in filtering a Pandas Dataframe (as shown in code below). However, depending on the analysis I am running, sometimes I would like to avoid specifying a value to filter for (for example, I would like to ignore the filtering by seg_device and filter the dataframe based on the os only).

However, the code below is forcing me to always specify some value (e.g. desktop). If I leave seg_device blank, df_ch_seg will return no data, given the condition df_ch.device == seg_device.

Would someone have any advice on how to make my code more flexible? My dataset is made of 1 million rows, per 16 columns. Below you see only 2 filters, but I have 15 in total (some of them are integers, some are strings columns). Thank you!

By looking at the code below, I would like to slightly change it so that it works in multiple occasions:

# [...]

seg_device = input('Enter device (e.g. desktop, ...): ')
seg_os = input('Enter operating system (e.g. Mac/iOS, Windows, ...):  ')

# [...]

# Define new dataframe df_ch_seg, based on df_ch, segmented based on above input values 
df_ch_seg = df_ch[(df_ch.device == seg_device)& (df_ch.os == seg_os)]

Upvotes: 0

Views: 666

Answers (2)

John Mommers
John Mommers

Reputation: 140

Maybe this code can help. Here 'a' is seg_device and 'b' is seg_os. So make sure that a='' and b=''. If you don't specify (input) 'a' then 'a' will be set to df['A'] and thereby all values in that column will be valid. The same could be done for 'b' and other columns in your Data Frame. Hope this is clear.

d = {'A':['a','b','a','b','a'], 'B':[1,2,3,4,5]}
df = pd.DataFrame(data=d)

a ='' # a is not specified and remains empty
if a=='': 
  a=df['A'] # set 'a' to all values in df['A'] 
b=1

(df['A']==a) & (df['B']>b)

Upvotes: 0

gold_cy
gold_cy

Reputation: 14216

If I understand correctly you just want to make this in to a function where the inputs provided will be tuples of (column, filter_value).

from ast import literal_eval

def mask_constructor(filters):
    mask = []
    for (col, val) in filters:
        op = (f"df_ch[{col}] == {val}")
        mask.append(op)
    return literal_eval(" & ".join(mask))

Then you could call it like so.

mask = mask_constructor(("device", "iPhone"), ("os", "iOS"))
df_ch[mask]

Upvotes: 1

Related Questions