Reputation: 193
I have a complex issue to solve. I have a dataframe like:
and the output desired is:
So it's basically logically grouping ie)
for first row: it's only a with a hence only 1 under a and remaining 0
for second row: it's a with b and hence 1 and 1 and rest 0 etc
Next based on the new df:
for each row, i have to filter a dataframe(say df3) eg:
1st row: df3.loc[(df3['name']=='a')]['flag'].count()
2nd row: df3.loc[(df3['name']=='a') & (df3['name']=='b')]['flag'].count()
3rd row: df3.loc[(df3['name']=='a') & (df3['name']=='b') & & (df3['name']=='c')]['flag'].count()
Any idea how to achieve this?
I am trying to generate those filters in a loop and then apply to df3 dataframe
Upvotes: 0
Views: 233
Reputation: 193
I was able to do that. So using 'eval' on dataframe. I was able to generate a string using a for loop and was able to pass that string in an eval function. eg:
for i in range(0,len(df.columns)):
if len(df.columns[i])==1:
uu=[]
st = column + "==" +"'"+str(df.columns[i][0])+"'"
#print(column)
uu.append(st)
uu2 = ' | '.join([str(elem) for elem in uu])
# uu2 string will directly go to df to get evaluated
aa = df[df.eval(uu2)]
Upvotes: 0
Reputation: 862641
You can use numpy:
df1 = pd.DataFrame(0, index=list('aaaabbbb'), columns=list('abcd'))
print (df1)
a b c d
a 0 0 0 0
a 0 0 0 0
a 0 0 0 0
a 0 0 0 0
b 0 0 0 0
b 0 0 0 0
b 0 0 0 0
b 0 0 0 0
First create lower triangle matrix by numpy.tril
and numpy.ones
:
size = len(df1.columns)
a = np.tril(np.ones((size, size), dtype=int))
print (a)
[[1 0 0 0]
[1 1 0 0]
[1 1 1 0]
[1 1 1 1]]
Then fill diagonal values to 1
by numpy.fill_diagonal
b = np.zeros((size, size), dtype=int)
np.fill_diagonal(b, 1)
print (b)
[[1 0 0 0]
[0 1 0 0]
[0 0 1 0]
[0 0 0 1]]
Repeat array a
by number of columns with numpy.tile
nad obverwrite each size
row by indexing with values of array b
:
c = np.tile(a, (size, 1))
c[::size] = b
print (c)
[[1 0 0 0]
[1 1 0 0]
[1 1 1 0]
[1 1 1 1]
[0 1 0 0]
[1 1 0 0]
[1 1 1 0]
[1 1 1 1]
[0 0 1 0]
[1 1 0 0]
[1 1 1 0]
[1 1 1 1]
[0 0 0 1]
[1 1 0 0]
[1 1 1 0]
[1 1 1 1]]
Last convert to DataFrame
:
df = pd.DataFrame(c, columns=df1.columns, index = np.repeat(df1.columns, size))
print (df)
a b c d
a 1 0 0 0
a 1 1 0 0
a 1 1 1 0
a 1 1 1 1
b 0 1 0 0
b 1 1 0 0
b 1 1 1 0
b 1 1 1 1
c 0 0 1 0
c 1 1 0 0
c 1 1 1 0
c 1 1 1 1
d 0 0 0 1
d 1 1 0 0
d 1 1 1 0
d 1 1 1 1
Upvotes: 1