Reputation: 83
So I have two data frames. Df1 has names and values as follows:
Source | Target | volume |
---|---|---|
dog | cat | 3 |
cat | rat | 2 |
Df2 has range for each name as follows:
name | min | max |
---|---|---|
dog | 1 | 2 |
cat | 1 | 3 |
what I would like to achieve is to check if the values in source or target columns in df1 exist in name column in df2, and if they exist to check if the volume is between them min and max values and result a data frame as shown below:
Source | Target | volume | S. in range | S. min | S. max | T. in range | T. min | T.max |
---|---|---|---|---|---|---|---|---|
dog | cat | 3 | No | 1 | 2 | Yes | 1 | 2 |
cat | rat | 2 | yes | 1 | 3 | - | - | - |
If the name does no exist in df2 there should be a line or similar indicating that the name was not found.
Im not entirely sure how to get this work as I need to check for values from two columns and combine them into one data frame.
Upvotes: 3
Views: 1123
Reputation: 23227
You can use .set_index()
on df2
to make it a lookup table by name
. Then use .map()
to map the Source
and Target
to get the min
, max
values:
df2a = df2.set_index('name')
df3 = df1.copy()
df3['S.in range'] = np.where(df3['Source'].isin(df2['name']), df3['volume'].between(df3['Source'].map(df2a['min']), df3['Source'].map(df2a['max'])), None)
df3['S.min'] = df3['Source'].map(df2a['min'])
df3['S.max'] = df3['Source'].map(df2a['max'])
df3['T.in range'] = np.where(df3['Target'].isin(df2['name']), df3['volume'].between(df3['Target'].map(df2a['min']), df3['Target'].map(df2a['max'])), None)
df3['T.min'] = df3['Target'].map(df2a['min'])
df3['T.max'] = df3['Target'].map(df2a['max'])
df3['S.in range'] = df3['S.in range'].replace({False: 'No', True: 'Yes'})
df3['T.in range'] = df3['T.in range'].replace({False: 'No', True: 'Yes'})
Result:
print(df3)
Source Target volume S.in range S.min S.max T.in range T.min T.max
0 dog cat 3 No 1 2 Yes 1.0 3.0
1 cat rat 2 Yes 1 3 None NaN NaN
(I suppose the column values
should be named volume
according to description and expected result. If not the case, can use values
in the codes above and rename it if necessary for desired output).
Upvotes: 1
Reputation: 863701
Use DataFrame.merge
with left join and rename columns for new DataFrame
s, then join in concat
and set in range
columns by numpy.select
for also test if no match, so assigned None
s and Series.between
:
df3 = (df1[['Source','volume']].rename(columns={'Source':'in range'})
.merge(df2.rename(columns={'name':'in range'}), on='in range', how='left'))
df4 = (df1[['Target','volume']].rename(columns={'Target':'in range'})
.merge(df2.rename(columns={'name':'in range'}), on='in range', how='left'))
df = pd.concat([df1,
df3.drop('volume',axis=1).add_prefix('S.'),
df4.drop('volume',axis=1).add_prefix('T.')], axis=1)
m10 = df['S.min'].isna()
m11 = df['volume'].between(df['S.min'], df['S.max'])
m20 = df['T.min'].isna()
m21 = df['volume'].between(df['T.min'], df['T.max'])
df['S.in range'] = np.select([m10, m11], [None, 'Yes'], 'No')
df['T.in range'] = np.select([m20, m21], [None, 'Yes'], 'No')
print (df)
Source Target volume S.in range S.min S.max T.in range T.min T.max
0 dog cat 3 No 1 2 Yes 1.0 3.0
1 cat rat 2 Yes 1 3 None NaN NaN
Upvotes: 1