Reputation: 210
This is the data frame that I want to search on and get back the matching row number.
'A'
and 'AB'
are completely different things.
df2 = pd.DataFrame(np.array(['A','B','AC','AD','NAN','XX','BC','SLK','AC','AD','NAN','XU','BB','FG','XZ','XY','AD','NAN','NF','XY','AB','AC','AD','NAN','XY','LK','AC','AC','AD','NAN','KH','BC','GF','BC','AD']).reshape(5,7),columns=['a','b','c','d','e','f','g'])
a b c d e f g
0 A B AC AD NAN XX BC
1 SLK AC AD NAN XU BB FG
2 XZ XY AD NAN NF XY AB
3 AC AD NAN XY LK AC AC
4 AD NAN KH BC GF BC AD
The strings I will be searching for are from this smaller data frame. Where each row has to be searched as AND, to get back matched string row index of data frame df2.
df = pd.DataFrame(np.array(['A','B','C','D','AA','AB','AC','AD','NAN','BB','BC','AD']).reshape(6,2),columns=['a1','b1'])
a1 b1
0 A B # present in the first row of df2
1 C D # not present in any row of df2
2 AA AB # not present in any row of df2
3 AC AD # present in the second row of df2
4 NAN BB # present in the second row of df2
5 BC AD # present in the fourth row of df2
AND part
Desired output [0,1,3,4]
import pandas as pd
import numpy as np
index1 = df.index # Finds the number of row in df
terms=[]
React=[]
for i in range(len(index1)): #for loop to search each row of df dataframe
terms=df.iloc[i] # Get i row
terms[i]=terms.values.tolist() # converts to a list
print(terms[i]) # to check
# each row
for term in terms[i]: # to search for each string in the
print(term)
results = pd.DataFrame()
if results.empty:
results = df2.isin( [ term ] )
else:
results |= df2.isin( [ term ] )
results['count'] = results.sum(axis=1)
print(results['count'])
print(results[results['count']==len(terms[i])].index.tolist())
React=results[results['count']==len(terms[i])].index.tolist()
React
Getting TypeError: unhashable type: 'list'
on results = df2.isin( [ term ] )
For OR it should be easy buy have to exclude AND parts which are already Accounted in the first section
React2=df2.isin([X]).any(1).index.tolist()
React2
Upvotes: 0
Views: 72
Reputation: 35135
It's not the output you'd expect, but I asked for the index in the AND condition. The resulting list of output contains the df2 indexes on a df row-by-row basis. Does this meet the intent of your question?
output = []
for i in range(len(df)):
tmp = []
for k in range(len(df2)):
d = df2.loc[k].isin(df.loc[i,['a1']])
f = df2.loc[k].isin(df.loc[i,['b1']])
d = d.tolist()
f = f.tolist()
if sum(d) >= 1 and sum(f) >=1:
tmp.append(k)
output.append(tmp)
output
[[0], [], [], [0, 1, 3], [1], [0, 4]]
Upvotes: 1