Reputation: 333
hope some one can help me.
Given one value per column, I succeeded in filtering a Pandas Dataframe (as shown in code below). However, depending on the analysis I am running, sometimes I would like to avoid specifying a value to filter for (for example, I would like to ignore the filtering by seg_device and filter the dataframe based on the os only).
However, the code below is forcing me to always specify some value (e.g. desktop). If I leave seg_device
blank, df_ch_seg
will return no data, given the condition df_ch.device == seg_device
.
Would someone have any advice on how to make my code more flexible? My dataset is made of 1 million rows, per 16 columns. Below you see only 2 filters, but I have 15 in total (some of them are integers, some are strings columns). Thank you!
By looking at the code below, I would like to slightly change it so that it works in multiple occasions:
# [...]
seg_device = input('Enter device (e.g. desktop, ...): ')
seg_os = input('Enter operating system (e.g. Mac/iOS, Windows, ...): ')
# [...]
# Define new dataframe df_ch_seg, based on df_ch, segmented based on above input values
df_ch_seg = df_ch[(df_ch.device == seg_device)& (df_ch.os == seg_os)]
Upvotes: 0
Views: 666
Reputation: 140
Maybe this code can help. Here 'a' is seg_device and 'b' is seg_os. So make sure that a='' and b=''. If you don't specify (input) 'a' then 'a' will be set to df['A'] and thereby all values in that column will be valid. The same could be done for 'b' and other columns in your Data Frame. Hope this is clear.
d = {'A':['a','b','a','b','a'], 'B':[1,2,3,4,5]}
df = pd.DataFrame(data=d)
a ='' # a is not specified and remains empty
if a=='':
a=df['A'] # set 'a' to all values in df['A']
b=1
(df['A']==a) & (df['B']>b)
Upvotes: 0
Reputation: 14216
If I understand correctly you just want to make this in to a function where the inputs provided will be tuples of (column, filter_value).
from ast import literal_eval
def mask_constructor(filters):
mask = []
for (col, val) in filters:
op = (f"df_ch[{col}] == {val}")
mask.append(op)
return literal_eval(" & ".join(mask))
Then you could call it like so.
mask = mask_constructor(("device", "iPhone"), ("os", "iOS"))
df_ch[mask]
Upvotes: 1