Reputation: 37
I have transformed the following toy chemical reactions into a DataFrame for further bipartite network representation:
R1: A + B -> C
R2: C + D -> E
SourceTarget
R1 C
A R1
B R1
R2 E
C R2
D R2
Now, I want to create a new DataFrame from this one, representing only the relationships between the reactions based on their compounds, for example:
In the DataFrame above C
is a Target from R1
and C
is also Source for R2
, then, the relationship should be:
R1->R2
(the only reaction-reaction relationship I can obtain for the Daframe above)
The code I have created for this task is the following:
newData=[]
for i in range(0,len(data["Target"].index.values)):
for j in range(0,len(data["Source"].index.values)):
if data.iloc[i,1] == data.iloc[j,0] and not re.match("R.",
data.iloc[i,1], flags=0):
newData.append(data.iloc[i,0] +"\t" + data.iloc[j,1])
The code works, however, for big tables (thousands of rows) it gets very slow... I'm still a beginner, so I would be really glad if you could help me to improve it. Thanks =D
Upvotes: 2
Views: 54
Reputation: 9264
You could merge the dataframe on the dateframe
RtoC = df.merge(df,how='inner',left_on='Source',right_on='Target')\
.drop(['Target_y','Source_x'],axis=1)\
.rename(columns={'Target_x':'Target','Source_y':'Source'})
Then filter out compounds
RtoC[(RtoC.Target.str.contains('\d()')) & (RtoC.Source.str.contains('\d()'))]
Target Source
4 R2 R1
Or Convert to a dictionary, map the values and filter
mapper = dict(df.values[::-1])
df.Target = df.Target.map(mapper)
df[(df.Target.str.contains('\d()')) & (df.Source.str.contains('\d()'))]
Source Target
0 R1 R2
Upvotes: 1
Reputation: 164663
My preference would be for a dictionary-based approach:
import pandas as pd
d = df.set_index('Source')['Target']
r = {i for i in set(df['Source']).union(df['Target']) if 'R' in i}
{k: d.get(d.get(k)) for k in r if d.get(d.get(k))}
# {'R1': 'R2'}
Upvotes: 1