tdawg
tdawg

Reputation: 99

pandas: use map() to assign values based on range called from a dictionary

I tried to assign values (taken from a dataframe) that need 2 identifier columns to assign a value in a separate df. I tried everything I could think of and am closest now with my 2 identifiers packed into a tuple and together with their values into a dict but my syntax might be off (or I simply can't do it this way). Open for any other ways there might be (I'm not married to the tuple idea). Thank you, all.

import pandas as pd

# look up table for the 'condition'. Every 'cell' has 1 'baseline', 'washon', 'washoff' each 
# or only one 'baseline'
lut_cols = ['filename', 'condition', 'start', 'end']
lut_vals = [['cell1', 'baseline', 1, 24],
        ['cell1', 'washon', 50, 70],
        ['cell1', 'washoff', 100, 120],
        ['cell2', 'baseline', 2, 22],
        ['cell2', 'washon', 50, 70],
        ['cell2', 'washoff', 100, 120],
        ['cell3', 'baseline', 1, 20]]
lut_df = pd.DataFrame(lut_vals, columns=lut_cols)

# the data that needs a 'condition' column mapped based on the 'filename' and 'record' values that
# need to be in the range 'start' to 'end' given in lut_df
data_cols = ['filename', 'record', 'data']
data_vals = [['cell1', 1, 'some_data0'],
        ['cell1', 1, 'some_data1'],
        ['cell1', 1, 'some_data2'],
        ['cell1', 25, 'some_data3'],
        ['cell1', 25, 'some_data4'],
        ['cell1', 101, 'some_data5'],
        ['cell2', 2, 'some_data6'],
        ['cell2', 2, 'some_data7'],
        ['cell2', 50, 'some_data8'],
        ['cell2', 50, 'some_dat9'],
        ['cell2', 80, 'some_dat10']]
        
df = pd.DataFrame(data_vals, columns=data_cols)

# the 'filename' and 'start' to 'end' together make up the unique identifiers keys (as tuple)
key = ()
lut_dict = {}
for i in range(lut_df.filename.shape[0]):
   key = (lut_df.filename.iloc[i], range(lut_df.start.iloc[i], lut_df.end.iloc[i]))
   lut_dict[key] =  lut_df.condition[i]

# the 'record' column is now the index because I thought it would make life easier...
df.set_index('record', inplace=True)
df['condition'] = df['filename'].map(lut_dict)
print(df)

lut_dict for reference:

lut_dict:
{('cell1', range(1, 24)): 'baseline',
 ('cell1', range(50, 70)): 'washon',
 ('cell1', range(100, 120)): 'washoff',
 ('cell2', range(2, 22)): 'baseline',
 ('cell2', range(50, 70)): 'washon',
 ('cell2', range(100, 120)): 'washoff',
 ('cell3', range(1, 20)): 'baseline'}

Sadly, this is the output:

    filename    data    condition
record          
1   cell1   some_data0  NaN
1   cell1   some_data1  NaN
1   cell1   some_data2  NaN
25  cell1   some_data3  NaN
25  cell1   some_data4  NaN
101 cell1   some_data5  NaN
2   cell2   some_data6  NaN
2   cell2   some_data7  NaN
50  cell2   some_data8  NaN
50  cell2   some_dat9   NaN
80  cell2   some_dat10  NaN


Upvotes: 2

Views: 690

Answers (2)

user7864386
user7864386

Reputation:

You can use merge_asof. Since both DataFrames must be sorted by the key, we first sort each by the keys to merge by.

df = df.sort_values(by='record')
lut_df = lut_df.sort_values(by='start')
out = pd.merge_asof(df, lut_df, left_on='record', right_on='start', suffixes=('','_'))[['filename','data', 'condition']]

Output:

   filename        data condition
0     cell1  some_data0  baseline
1     cell1  some_data1  baseline
2     cell1  some_data2  baseline
3     cell2  some_data6  baseline
4     cell2  some_data7  baseline
5     cell1  some_data3  baseline
6     cell1  some_data4  baseline
7     cell2  some_data8    washon
8     cell2   some_dat9    washon
9     cell2  some_dat10    washon
10    cell1  some_data5   washoff

Upvotes: 2

Corralien
Corralien

Reputation: 120429

Create a dataframe from your lut_dict then merge it to your data:

df1 = pd.Series(lut_dict).rename_axis(['filename', 'record']).rename('condition') \
        .reset_index().explode('record')

out = df.merge(df1, on=['filename', 'record'], how='left')

Output:

>>> out
   filename record        data condition
0     cell1      1  some_data0  baseline
1     cell1      1  some_data1  baseline
2     cell1      1  some_data2  baseline
3     cell1     25  some_data3       NaN
4     cell1     25  some_data4       NaN
5     cell1    101  some_data5   washoff
6     cell2      2  some_data6  baseline
7     cell2      2  some_data7  baseline
8     cell2     50  some_data8    washon
9     cell2     50   some_dat9    washon
10    cell2     80  some_dat10       NaN

>>> df1
   filename record condition
0     cell1      1  baseline
0     cell1      2  baseline
0     cell1      3  baseline
0     cell1      4  baseline
0     cell1      5  baseline
..      ...    ...       ...
6     cell3     15  baseline
6     cell3     16  baseline
6     cell3     17  baseline
6     cell3     18  baseline
6     cell3     19  baseline

[142 rows x 3 columns]

Upvotes: 3

Related Questions