Protima Rani Paul
Protima Rani Paul

Reputation: 210

How to Match Strings from multiple data frame and return indexes with AND and OR options

This is the data frame that I want to search on and get back the matching row number. 'A' and 'AB' are completely different things.

df2 = pd.DataFrame(np.array(['A','B','AC','AD','NAN','XX','BC','SLK','AC','AD','NAN','XU','BB','FG','XZ','XY','AD','NAN','NF','XY','AB','AC','AD','NAN','XY','LK','AC','AC','AD','NAN','KH','BC','GF','BC','AD']).reshape(5,7),columns=['a','b','c','d','e','f','g'])


    a   b   c   d   e   f   g
0   A   B   AC  AD  NAN XX  BC
1   SLK AC  AD  NAN XU  BB  FG
2   XZ  XY  AD  NAN NF  XY  AB
3   AC  AD  NAN XY  LK  AC  AC
4   AD  NAN KH  BC  GF  BC  AD

The strings I will be searching for are from this smaller data frame. Where each row has to be searched as AND, to get back matched string row index of data frame df2.

df = pd.DataFrame(np.array(['A','B','C','D','AA','AB','AC','AD','NAN','BB','BC','AD']).reshape(6,2),columns=['a1','b1'])


a1  b1
0   A   B  # present in the first row of df2
1   C   D  # not present in any row of df2
2   AA  AB # not present in any row of df2
3   AC  AD # present in the second row of df2
4   NAN BB # present in the second row of df2
5   BC  AD # present in the fourth row of df2

AND part

Desired output [0,1,3,4]

import pandas as pd
import numpy as np


index1 = df.index # Finds the number of row in df
terms=[]
React=[]
for i in range(len(index1)): #for loop to search each row of df dataframe
  terms=df.iloc[i] # Get i row
  terms[i]=terms.values.tolist() # converts to a list
  print(terms[i]) # to check
    # each row
  for term in terms[i]: # to search for each string in the 
    print(term)
    results = pd.DataFrame()
    if results.empty:
      results = df2.isin( [ term ] )
    else:
      results |= df2.isin( [ term ] ) 
  results['count'] = results.sum(axis=1)
  print(results['count'])
  print(results[results['count']==len(terms[i])].index.tolist())
  React=results[results['count']==len(terms[i])].index.tolist()
  React

Getting TypeError: unhashable type: 'list' on results = df2.isin( [ term ] )

For OR it should be easy buy have to exclude AND parts which are already Accounted in the first section

React2=df2.isin([X]).any(1).index.tolist()
React2

Upvotes: 0

Views: 72

Answers (1)

r-beginners
r-beginners

Reputation: 35135

It's not the output you'd expect, but I asked for the index in the AND condition. The resulting list of output contains the df2 indexes on a df row-by-row basis. Does this meet the intent of your question?

output = []
for i in range(len(df)):
    tmp = []
    for k in range(len(df2)):
        d = df2.loc[k].isin(df.loc[i,['a1']])
        f = df2.loc[k].isin(df.loc[i,['b1']])
        d = d.tolist()
        f = f.tolist()
        if sum(d) >= 1 and sum(f) >=1:
            tmp.append(k)
    output.append(tmp)

output
[[0], [], [], [0, 1, 3], [1], [0, 4]]

Upvotes: 1

Related Questions