How to find the elements from one column which also appear in another column of a DataFrame in Python

Question

I have transformed the following toy chemical reactions into a DataFrame for further bipartite network representation:

R1: A + B -> C

R2: C + D -> E

SourceTarget
R1    C
A     R1
B     R1
R2    E
C     R2
D     R2

Now, I want to create a new DataFrame from this one, representing only the relationships between the reactions based on their compounds, for example: In the DataFrame above C is a Target from R1 and C is also Source for R2, then, the relationship should be:

R1->R2

(the only reaction-reaction relationship I can obtain for the Daframe above)

The code I have created for this task is the following:

newData=[]
    for i in range(0,len(data["Target"].index.values)):
        for j in range(0,len(data["Source"].index.values)):  
            if data.iloc[i,1] == data.iloc[j,0] and not re.match("R.", 
            data.iloc[i,1], flags=0):
                newData.append(data.iloc[i,0] +"	" + data.iloc[j,1])

The code works, however, for big tables (thousands of rows) it gets very slow... I'm still a beginner, so I would be really glad if you could help me to improve it. Thanks =D

DJK · Accepted Answer

You could merge the dataframe on the dateframe

RtoC = df.merge(df,how='inner',left_on='Source',right_on='Target')\
                .drop(['Target_y','Source_x'],axis=1)\
                .rename(columns={'Target_x':'Target','Source_y':'Source'})

Then filter out compounds

RtoC[(RtoC.Target.str.contains('\d()')) & (RtoC.Source.str.contains('\d()'))]


  Target Source
4     R2     R1

Or Convert to a dictionary, map the values and filter

mapper = dict(df.values[::-1])

df.Target = df.Target.map(mapper)

df[(df.Target.str.contains('\d()')) & (df.Source.str.contains('\d()'))]

  Source Target
0     R1     R2

How to find the elements from one column which also appear in another column of a DataFrame in Python

Answers (2)

Related Questions