Arman Mojaver
Arman Mojaver

Reputation: 451

Apply custom function to 2 dataframes in pandas

I got these 2 dataframes

data = {
    'col1': {1: 9, 2: 20, 3: 35, 4: 47, 5: 30, 6: 8},
    'col2': {1: 1, 2: 5, 3: 30, 4: 45, 5: 46, 6: 85},
    'col3': {1: 11, 2: 18, 3: 22, 4: 27, 5: 32, 6: 90},
}

thresholds = {
    0: {1: 0, 2: 10, 3: 20, 4: 0, 5: 40, 6: 0},
    1: {1: 10, 2: 15, 3: 25, 4: 10, 5: 50, 6: 1},
    2: {1: 20, 2: 17, 3: 29, 4: 60, 5: 60, 6: 2},
    3: {1: 30, 2: 25, 3: 39, 4: 70, 5: 70, 6: 3},
    4: {1: 40, 2: 35, 3: 50, 4: 75, 5: 80, 6: 10},
}

data = pd.DataFrame(data)
thresholds = pd.DataFrame(thresholds)

where the amount of rows is the same for both dataframes.

I need to obtain another dataframe that introduces one column of data (for example 'col1') in thresholds, and it fits every element of the column in its corresponding position in the row, returning all the indexes.

Here are the dataframes when printed.

data:

   col1  col2  col3
1     9     1    11
2    20     5    18
3    35    30    22
4    47    45    27
5    30    46    32
6     8    85    90

thresholds:

    0   1   2   3   4
1   0  10  20  30  40
2  10  15  17  25  35
3  20  25  29  39  50
4   0  10  60  70  75
5  40  50  60  70  80
6   0   1   2   3  10

result:

   col1  col2  col3
1     1     1     2
2     3     0     3
3     3     3     1
4     2     2     2
5     0     1     0
6     4     5     5

Examples: result['col1'][1] is 1 because data['col1'][1] --> 9 would be in position 1 when introduced in row 1 of thresholds: [0, 10, 20, 30, 40]

result['col2'][6] is 5 because data['col2'][6] --> 85 would be in position 5 when introduced in row 6 of thresholds: [0, 1, 2, 3, 10]

I am looking for the result dataframe using pandas. I found a function in pandas named searchsorted, and I know that I need to use .apply() (or .applymap()), and use lambda somewhere. I just don't know how to put the pieces together.

for loops are no good, too slow. The real dataframes are large, with thousands of rows and columns.

Here is the result dataframe in code (in case needed):

result = pd.DataFrame(
    {
        'col1': {1: 1, 2: 3, 3: 3, 4: 2, 5: 0, 6: 4},
        'col2': {1: 1, 2: 0, 3: 3, 4: 2, 5: 1, 6: 5},
        'col3': {1: 2, 2: 3, 3: 1, 4: 2, 5: 0, 6: 5},
    }
)

Upvotes: 0

Views: 159

Answers (1)

BENY
BENY

Reputation: 323226

In your case apply the searchsorted from numpy

result = data.apply(lambda x : pd.Series(np.searchsorted(thresholds.loc[x.name],x)),axis=1)
Out[13]: 
   0  1  2
1  1  1  2
2  3  0  3
3  3  3  1
4  2  2  2
5  0  1  0
6  4  5  5

Upvotes: 3

Related Questions