Dividing values in a dataframe based on another value in dataframe

Question

I'm trying to create a column in a dataframe that is a result of dividing values based on another value in the dataframe.

So this means that i would like to divide the SCI value that has a corrosponding Temp value between 19.5 and 20.5 and equal chainage.

I created a small dataframe that could help solve the problem.

data = {'Chainage':[10,20,30,10,20,30,10,20,30], 'SCI':[123, 45, 19, 18, 36, 125, 54, 78,85], 'Temp':[20.4,35,16,22,20.1,19.8,18,21,28]} 
df = pd.DataFrame(data)

The dataframe:

   Chainage  SCI  Temp
0        10  123  20.4
1        20   45  35.0
2        30   19  16.0
3        10   18  22.0
4        20   36  20.1
5        30  125  19.8
6        10   54  18.0
7        20   78  21.0
8        30   85  28.0

Here is the end result as it should be. Grouped by the chainage, and then the SCI values with a Temp between 19.5 and 20.5 is used to divide with the others in the group. I have tried to illustrate below:

   Chainage  SCI  Temp  f
0        10  123  20.4  123/123 = 1
3        10   18  22.0  123/18 = 6.8
6        10   54  18.0  123/54 = 2.2
7        20   78  21.0  36/78 = 0.4
1        20   45  35.0  36/45 = 0.8
4        20   36  20.1  36/36 = 1
2        30   19  16.0  6.6
5        30  125  19.8  1
8        30   85  28.0  1.5

I have been trying to use the groupby but gets stuck when adding the extra conditioning. Any help is appreciated.

Paul H · Accepted Answer

I think you need to break this up into a couple of steps.

Compute your "base temperatures"
merge the base temps into the main dataframe
do the division
clean up the extra columns.

That looks like this:

import pandas

data = {
    'Chainage':[10,20,30,10,20,30,10,20,30],
    'SCI':[123, 45, 19, 18, 36, 125, 54, 78,85],
    'Temp':[20.4,35,16,22,20.1,19.8,18,21,28]
}

df = pandas.DataFrame(data)


base_temp = (
    df.loc[df['Temp'].between(19.5, 20.5)]
      .groupby('Chainage', as_index=False)
      .first()
      .drop(columns=['SCI'])
)

The base_temp dataframe looks like this:

   Chainage  Temp
0        10  20.4
1        20  20.1
2        30  19.8

We queried out the rows where the temperature was in the correct range, but then did a group-by/first to ensure didn't have any duplicate Chainage values.

Now we can do everything else:


result = (
    df.merge(base_temp, on='Chainage', how='left', suffixes=('', '_base'))
      .assign(f=lambda df: df['Temp_base'] / df['Temp'])
      .drop(columns=['Temp_base']) 
)

Which gives you:

   Chainage  SCI  Temp         f
0        10  123  20.4  1.000000
1        20   45  35.0  0.574286
2        30   19  16.0  1.237500
3        10   18  22.0  0.927273
4        20   36  20.1  1.000000
5        30  125  19.8  1.000000
6        10   54  18.0  1.133333
7        20   78  21.0  0.957143
8        30   85  28.0  0.707143

Dividing values in a dataframe based on another value in dataframe

Answers (1)

Related Questions