Reputation: 99
I tried to assign values (taken from a dataframe) that need 2 identifier columns to assign a value in a separate df. I tried everything I could think of and am closest now with my 2 identifiers packed into a tuple and together with their values into a dict but my syntax might be off (or I simply can't do it this way). Open for any other ways there might be (I'm not married to the tuple idea). Thank you, all.
import pandas as pd
# look up table for the 'condition'. Every 'cell' has 1 'baseline', 'washon', 'washoff' each
# or only one 'baseline'
lut_cols = ['filename', 'condition', 'start', 'end']
lut_vals = [['cell1', 'baseline', 1, 24],
['cell1', 'washon', 50, 70],
['cell1', 'washoff', 100, 120],
['cell2', 'baseline', 2, 22],
['cell2', 'washon', 50, 70],
['cell2', 'washoff', 100, 120],
['cell3', 'baseline', 1, 20]]
lut_df = pd.DataFrame(lut_vals, columns=lut_cols)
# the data that needs a 'condition' column mapped based on the 'filename' and 'record' values that
# need to be in the range 'start' to 'end' given in lut_df
data_cols = ['filename', 'record', 'data']
data_vals = [['cell1', 1, 'some_data0'],
['cell1', 1, 'some_data1'],
['cell1', 1, 'some_data2'],
['cell1', 25, 'some_data3'],
['cell1', 25, 'some_data4'],
['cell1', 101, 'some_data5'],
['cell2', 2, 'some_data6'],
['cell2', 2, 'some_data7'],
['cell2', 50, 'some_data8'],
['cell2', 50, 'some_dat9'],
['cell2', 80, 'some_dat10']]
df = pd.DataFrame(data_vals, columns=data_cols)
# the 'filename' and 'start' to 'end' together make up the unique identifiers keys (as tuple)
key = ()
lut_dict = {}
for i in range(lut_df.filename.shape[0]):
key = (lut_df.filename.iloc[i], range(lut_df.start.iloc[i], lut_df.end.iloc[i]))
lut_dict[key] = lut_df.condition[i]
# the 'record' column is now the index because I thought it would make life easier...
df.set_index('record', inplace=True)
df['condition'] = df['filename'].map(lut_dict)
print(df)
lut_dict for reference:
lut_dict:
{('cell1', range(1, 24)): 'baseline',
('cell1', range(50, 70)): 'washon',
('cell1', range(100, 120)): 'washoff',
('cell2', range(2, 22)): 'baseline',
('cell2', range(50, 70)): 'washon',
('cell2', range(100, 120)): 'washoff',
('cell3', range(1, 20)): 'baseline'}
Sadly, this is the output:
filename data condition
record
1 cell1 some_data0 NaN
1 cell1 some_data1 NaN
1 cell1 some_data2 NaN
25 cell1 some_data3 NaN
25 cell1 some_data4 NaN
101 cell1 some_data5 NaN
2 cell2 some_data6 NaN
2 cell2 some_data7 NaN
50 cell2 some_data8 NaN
50 cell2 some_dat9 NaN
80 cell2 some_dat10 NaN
Upvotes: 2
Views: 690
Reputation:
You can use merge_asof
. Since both DataFrames must be sorted by the key, we first sort each by the keys to merge by.
df = df.sort_values(by='record')
lut_df = lut_df.sort_values(by='start')
out = pd.merge_asof(df, lut_df, left_on='record', right_on='start', suffixes=('','_'))[['filename','data', 'condition']]
Output:
filename data condition
0 cell1 some_data0 baseline
1 cell1 some_data1 baseline
2 cell1 some_data2 baseline
3 cell2 some_data6 baseline
4 cell2 some_data7 baseline
5 cell1 some_data3 baseline
6 cell1 some_data4 baseline
7 cell2 some_data8 washon
8 cell2 some_dat9 washon
9 cell2 some_dat10 washon
10 cell1 some_data5 washoff
Upvotes: 2
Reputation: 120429
Create a dataframe from your lut_dict
then merge it to your data:
df1 = pd.Series(lut_dict).rename_axis(['filename', 'record']).rename('condition') \
.reset_index().explode('record')
out = df.merge(df1, on=['filename', 'record'], how='left')
Output:
>>> out
filename record data condition
0 cell1 1 some_data0 baseline
1 cell1 1 some_data1 baseline
2 cell1 1 some_data2 baseline
3 cell1 25 some_data3 NaN
4 cell1 25 some_data4 NaN
5 cell1 101 some_data5 washoff
6 cell2 2 some_data6 baseline
7 cell2 2 some_data7 baseline
8 cell2 50 some_data8 washon
9 cell2 50 some_dat9 washon
10 cell2 80 some_dat10 NaN
>>> df1
filename record condition
0 cell1 1 baseline
0 cell1 2 baseline
0 cell1 3 baseline
0 cell1 4 baseline
0 cell1 5 baseline
.. ... ... ...
6 cell3 15 baseline
6 cell3 16 baseline
6 cell3 17 baseline
6 cell3 18 baseline
6 cell3 19 baseline
[142 rows x 3 columns]
Upvotes: 3