jakkis
jakkis

Reputation: 83

Pandas check if cell value is between range from another data frame

So I have two data frames. Df1 has names and values as follows:

Source Target volume
dog cat 3
cat rat 2

Df2 has range for each name as follows:

name min max
dog 1 2
cat 1 3

what I would like to achieve is to check if the values in source or target columns in df1 exist in name column in df2, and if they exist to check if the volume is between them min and max values and result a data frame as shown below:

Source Target volume S. in range S. min S. max T. in range T. min T.max
dog cat 3 No 1 2 Yes 1 2
cat rat 2 yes 1 3 - - -

If the name does no exist in df2 there should be a line or similar indicating that the name was not found.

Im not entirely sure how to get this work as I need to check for values from two columns and combine them into one data frame.

Upvotes: 3

Views: 1123

Answers (2)

SeaBean
SeaBean

Reputation: 23227

You can use .set_index() on df2 to make it a lookup table by name. Then use .map() to map the Source and Target to get the min, max values:

df2a = df2.set_index('name')
df3 = df1.copy()

df3['S.in range'] = np.where(df3['Source'].isin(df2['name']), df3['volume'].between(df3['Source'].map(df2a['min']), df3['Source'].map(df2a['max'])), None)
df3['S.min'] = df3['Source'].map(df2a['min'])
df3['S.max'] = df3['Source'].map(df2a['max'])
df3['T.in range'] = np.where(df3['Target'].isin(df2['name']), df3['volume'].between(df3['Target'].map(df2a['min']), df3['Target'].map(df2a['max'])), None)
df3['T.min'] = df3['Target'].map(df2a['min'])
df3['T.max'] = df3['Target'].map(df2a['max'])

df3['S.in range'] = df3['S.in range'].replace({False: 'No', True: 'Yes'})
df3['T.in range'] = df3['T.in range'].replace({False: 'No', True: 'Yes'})

Result:

print(df3)


  Source Target  volume S.in range  S.min  S.max T.in range  T.min  T.max
0    dog    cat       3         No      1      2        Yes    1.0    3.0
1    cat    rat       2        Yes      1      3       None    NaN    NaN

(I suppose the column values should be named volume according to description and expected result. If not the case, can use values in the codes above and rename it if necessary for desired output).

Upvotes: 1

jezrael
jezrael

Reputation: 863701

Use DataFrame.merge with left join and rename columns for new DataFrames, then join in concat and set in range columns by numpy.select for also test if no match, so assigned Nones and Series.between:

df3 = (df1[['Source','volume']].rename(columns={'Source':'in range'})
        .merge(df2.rename(columns={'name':'in range'}), on='in range', how='left'))
df4 = (df1[['Target','volume']].rename(columns={'Target':'in range'})
         .merge(df2.rename(columns={'name':'in range'}), on='in range', how='left'))

df = pd.concat([df1, 
                df3.drop('volume',axis=1).add_prefix('S.'),
                df4.drop('volume',axis=1).add_prefix('T.')], axis=1)

m10 = df['S.min'].isna()
m11 = df['volume'].between(df['S.min'], df['S.max'])

m20 = df['T.min'].isna()
m21 = df['volume'].between(df['T.min'], df['T.max'])

df['S.in range'] = np.select([m10, m11], [None, 'Yes'], 'No')
df['T.in range'] = np.select([m20, m21], [None, 'Yes'], 'No')
print (df)
  Source Target  volume S.in range  S.min  S.max T.in range  T.min  T.max
0    dog    cat       3         No      1      2        Yes    1.0    3.0
1    cat    rat       2        Yes      1      3       None    NaN    NaN

Upvotes: 1

Related Questions