Akshat Shreemali
Akshat Shreemali

Reputation: 193

Pandas apply filter dynamically

I have a complex issue to solve. I have a dataframe like:

enter image description here

and the output desired is:

enter image description here

So it's basically logically grouping ie)

for first row: it's only a with a hence only 1 under a and remaining 0

for second row: it's a with b and hence 1 and 1 and rest 0 etc

Next based on the new df:

for each row, i have to filter a dataframe(say df3) eg:

enter image description here

1st row: df3.loc[(df3['name']=='a')]['flag'].count()

2nd row: df3.loc[(df3['name']=='a') & (df3['name']=='b')]['flag'].count()

3rd row: df3.loc[(df3['name']=='a') & (df3['name']=='b') & & (df3['name']=='c')]['flag'].count()

Any idea how to achieve this?

I am trying to generate those filters in a loop and then apply to df3 dataframe

Upvotes: 0

Views: 233

Answers (2)

Akshat Shreemali
Akshat Shreemali

Reputation: 193

I was able to do that. So using 'eval' on dataframe. I was able to generate a string using a for loop and was able to pass that string in an eval function. eg:

for i in range(0,len(df.columns)):
        if len(df.columns[i])==1:
            uu=[]
            st = column + "==" +"'"+str(df.columns[i][0])+"'"
            #print(column)
            uu.append(st)
            uu2 = ' | '.join([str(elem) for elem in uu]) 
            # uu2 string will directly go to df to get evaluated
            aa = df[df.eval(uu2)]

Upvotes: 0

jezrael
jezrael

Reputation: 862641

You can use numpy:

df1 = pd.DataFrame(0, index=list('aaaabbbb'), columns=list('abcd'))
print (df1)
   a  b  c  d
a  0  0  0  0
a  0  0  0  0
a  0  0  0  0
a  0  0  0  0
b  0  0  0  0
b  0  0  0  0
b  0  0  0  0
b  0  0  0  0

First create lower triangle matrix by numpy.tril and numpy.ones:

size = len(df1.columns)
         
a = np.tril(np.ones((size, size), dtype=int))
print (a)
[[1 0 0 0]
 [1 1 0 0]
 [1 1 1 0]
 [1 1 1 1]]

Then fill diagonal values to 1 by numpy.fill_diagonal

b = np.zeros((size, size), dtype=int)
np.fill_diagonal(b, 1)
print (b)
[[1 0 0 0]
 [0 1 0 0]
 [0 0 1 0]
 [0 0 0 1]]

Repeat array a by number of columns with numpy.tile nad obverwrite each size row by indexing with values of array b:

c = np.tile(a, (size, 1))
c[::size] = b
print (c)
[[1 0 0 0]
 [1 1 0 0]
 [1 1 1 0]
 [1 1 1 1]
 [0 1 0 0]
 [1 1 0 0]
 [1 1 1 0]
 [1 1 1 1]
 [0 0 1 0]
 [1 1 0 0]
 [1 1 1 0]
 [1 1 1 1]
 [0 0 0 1]
 [1 1 0 0]
 [1 1 1 0]
 [1 1 1 1]]

Last convert to DataFrame:

df = pd.DataFrame(c, columns=df1.columns, index = np.repeat(df1.columns, size))
print (df)
   a  b  c  d
a  1  0  0  0
a  1  1  0  0
a  1  1  1  0
a  1  1  1  1
b  0  1  0  0
b  1  1  0  0
b  1  1  1  0
b  1  1  1  1
c  0  0  1  0
c  1  1  0  0
c  1  1  1  0
c  1  1  1  1
d  0  0  0  1
d  1  1  0  0
d  1  1  1  0
d  1  1  1  1

Upvotes: 1

Related Questions