OregonMortensen
OregonMortensen

Reputation: 41

Dividing values in a dataframe based on another value in dataframe

I'm trying to create a column in a dataframe that is a result of dividing values based on another value in the dataframe.

So this means that i would like to divide the SCI value that has a corrosponding Temp value between 19.5 and 20.5 and equal chainage.

I created a small dataframe that could help solve the problem.

data = {'Chainage':[10,20,30,10,20,30,10,20,30], 'SCI':[123, 45, 19, 18, 36, 125, 54, 78,85], 'Temp':[20.4,35,16,22,20.1,19.8,18,21,28]} 
df = pd.DataFrame(data)

The dataframe:

   Chainage  SCI  Temp
0        10  123  20.4
1        20   45  35.0
2        30   19  16.0
3        10   18  22.0
4        20   36  20.1
5        30  125  19.8
6        10   54  18.0
7        20   78  21.0
8        30   85  28.0

Here is the end result as it should be. Grouped by the chainage, and then the SCI values with a Temp between 19.5 and 20.5 is used to divide with the others in the group. I have tried to illustrate below:

   Chainage  SCI  Temp  f
0        10  123  20.4  123/123 = 1
3        10   18  22.0  123/18 = 6.8
6        10   54  18.0  123/54 = 2.2
7        20   78  21.0  36/78 = 0.4
1        20   45  35.0  36/45 = 0.8
4        20   36  20.1  36/36 = 1
2        30   19  16.0  6.6
5        30  125  19.8  1
8        30   85  28.0  1.5

I have been trying to use the groupby but gets stuck when adding the extra conditioning. Any help is appreciated.

Upvotes: 0

Views: 63

Answers (1)

Paul H
Paul H

Reputation: 68156

I think you need to break this up into a couple of steps.

  1. Compute your "base temperatures"
  2. merge the base temps into the main dataframe
  3. do the division
  4. clean up the extra columns.

That looks like this:

import pandas

data = {
    'Chainage':[10,20,30,10,20,30,10,20,30],
    'SCI':[123, 45, 19, 18, 36, 125, 54, 78,85],
    'Temp':[20.4,35,16,22,20.1,19.8,18,21,28]
}

df = pandas.DataFrame(data)


base_temp = (
    df.loc[df['Temp'].between(19.5, 20.5)]
      .groupby('Chainage', as_index=False)
      .first()
      .drop(columns=['SCI'])
)

The base_temp dataframe looks like this:

   Chainage  Temp
0        10  20.4
1        20  20.1
2        30  19.8

We queried out the rows where the temperature was in the correct range, but then did a group-by/first to ensure didn't have any duplicate Chainage values.

Now we can do everything else:


result = (
    df.merge(base_temp, on='Chainage', how='left', suffixes=('', '_base'))
      .assign(f=lambda df: df['Temp_base'] / df['Temp'])
      .drop(columns=['Temp_base']) 
)

Which gives you:

   Chainage  SCI  Temp         f
0        10  123  20.4  1.000000
1        20   45  35.0  0.574286
2        30   19  16.0  1.237500
3        10   18  22.0  0.927273
4        20   36  20.1  1.000000
5        30  125  19.8  1.000000
6        10   54  18.0  1.133333
7        20   78  21.0  0.957143
8        30   85  28.0  0.707143

Upvotes: 1

Related Questions