Index-match/MaxIF function using Python

Question

I hope someone could help me solving this python scripting. I'd like to do something similar to index-match function in excel using Python.

I have two data sets shown below: -

a) The dataframe 1 (df):

Index	Node 1	Node 2	Value
0	ABBDD	DSGHSH	164.35
1	ABNEQY	ASFCZC	125.40
2	ABNERR	ZDCDBB	150.40
3	BANDER	DSVBDG	0.00
4	GDCHR	CVXSVG	94.05
5	GFRSA	ZFGDHJD	94.05
6	SRHNAI	SDWRWW	119.70
7	ZDCDBB	BANDERR	148.20
8	VBZ3VG	EWEWWE	148.20
9	SDZXZC	ADGSGSG	125.40

b) The dataframe 2 (df2):

Index	Node	Value
0	SDZX
1	VBZ3
2	AAAF
3	ADGS
4	DSVB
5	ZDCD
6	GFRSA
7	ABNE
8	ABBD
9	EWEW
10	SRHNA
11	BANDER

The first data frame comprises of two longer node names and a value column which I'd like to copy to the second data frame if its shorter node name contains in either one of the node names in the first data frame. If more than one value is available, only the maximum value will be selected.

My desired output:

Index	Node	Value
0	SDZX	125.40
1	VBZ3	148.20
2	AAAF	NaN
3	ADGS	125.40
4	DSVB	0.00
5	ZDCD	150.40
6	GFRSA	94.05
7	ABNE	150.40
8	ABBD	164.35
9	EWEW	148.20
10	SRHNA	119.70
11	BANDER	148.20

I have tried several ways. First step, I reduce the Node 1 and Node 2 name to 5 characters. I understand this is not the right way as this implies that I'll miss out those Nodes with more than 5 characters. I can't figure out how to use str.contains in join/merge function in Pandas to solve this problem.

df['Node 1'] = df['Node 1'].str.slice(0,5)
df['Node 2'] = df['Node 2'].str[:5]

In the second step, I wrote the following code. It can seem to do the job but looks quite inefficient. I wonder if someone has any suggestion to improve the scripts. Basically, I split the steps into two. I compare the node names in df2 with Node 1 and then Node 2. Ideally I'd like to be able to compare two nodes in df with df2 at once.

match_node1 = df2.reset_index().set_index('Node').join(df.set_index(['Node 1']))
match_node1 = match_node1.sort_values('Value',ascending=True).drop_duplicates(subset='index', keep='last').set_index('index')
match_node2 = df2.reset_index().set_index('Node').join(df.set_index(['Node 2']))
match_node2 = match_node2.sort_values('Value',ascending=True).drop_duplicates(subset='index', keep='last').set_index('index')

Any help/advice will be greatly appreciated.

Ahmed · Accepted Answer

I think this should work, now:

I started from scratch because you're only providing a tables not coded data simple.

Creating DataFrames

import pandas as pd
dicty1 = {'Node1':['ABNEQY', 'ABNERR', 'VBZ3VG'],
                'Node2': ['ADGSGSG', 'EWEWWE', 'DSVBDG'],
                'Value': [125.40, 150.40,  0.00]
            }
dicty2 = {'Node':['ABNE', 'ABNE','VBZ3', 'ADGS', 'EWEW', 'DSVB'],
                'value': None
              }

df1 = pd.DataFrame(dicty1)
df2 = pd.DataFrame(dicty2)

Data Frame 1

Node1	Node2	Value
ABNEQY	ADGSGSG	125.4
ABNERR	EWEWWE	150.4
VBZ3VG	DSVBDG	0.0

Data Frame 2

Node	value
ABNE	None
ABNE	None
VBZ3	None
ADGS	None
EWEW	None
DSVB	None

Now, I had to concatenate both columns [Node1, Node2] of df1 so, I can deal with them when I merge the dataframes.

cols = [df1['Node1'], df1['Node2']]
conc_cols = pd.concat(cols, names='Node', ignore_index= True)   # This's a 'pandas Series' I've to convert it into a 'pandas DataFrame'
rename = conc_cols.rename('new_Node')
new_df = pd.DataFrame(rename)

new_df Dataframe

new_Node
ABNEQY
ABNERR
VBZ3VG
ADGSGSG
EWEWWE
DSVBDG

Now, I need to concatenate it with its values so, I've to duplicate the value column to equalize the values of new_df.

new_df['duplicates_values'] = pd.concat([df1['Value'], df1['Value']], ignore_index=True)
new_df["new_Node"] = new_df['new_Node'].astype(str).str[:4]

Let's remove the duplicates from both df2 and new_df:

df2.drop_duplicates('Node',inplace=True)
new_df.drop_duplicates('new_Node',inplace=True, keep='last') #TODO change happens here

Then, pandas.merge():

df3 =  pd.merge(df2, new_df, how= 'left', left_on = 'Node', right_on = 'new_Node')

DataFrame3

Node	value	new_Node	duplicates_ values
ABNE	None	ABNE	150.4
VBZ3	None	VBZ3	0.0
ADGS	None	ADGS	125.4
EWEW	None	EWEW	150.4
DSVB	None	DSVB	0.0

Now, we need to select the duplicates_ values and convert it into a list then insert it into df2 -> value

df2['value'] = df3['duplicates_ values'].tolist()

OUTPUT Of DataFrame 2

Node	value
ABNE	150.4
VBZ3	0.0
ADGS	125.4
EWEW	150.4
DSVB	0.0

Index-match/MaxIF function using Python

Answers (1)

Related Questions