cfort
cfort

Reputation: 2776

Upsample non-time data

Is there a better way to upsample a series of floats based on start and stop conditions contained in each row? Here's a sample of what I start with:

borehole   top  bottom  lithology
0     AP-2  94.6    95.1   dolomite
1     AP-2  95.1    96.7  limestone
2     AP-2  96.7    97.0   dolomite
3     AP-2  97.0    97.5  limestone
4     AP-2  97.5    97.8  limestone
5     AP-3  87.4    87.7  limestone
6     AP-3  87.7    88.1  limestone
7     AP-3  88.1    88.5   dolomite
8     AP-3  88.5    89.1  limestone

For each row I want to add rows between the top and bottom values using a chosen increment and forward fill the borehole and lithology values. The increment is constant in the dataframe. Here's the filled dataframe I want for an increment of 0.1. (This is just for the first two lines in the above dataframe:)

   borehole   Top       Lith
0      AP-2  94.6   dolomite
1      AP-2  94.7   dolomite
2      AP-2  94.8   dolomite
3      AP-2  94.9   dolomite
4      AP-2  95.0   dolomite
5      AP-2  95.1  limestone
6      AP-2  95.2  limestone
7      AP-2  95.3  limestone
8      AP-2  95.4  limestone
9      AP-2  95.5  limestone
10     AP-2  95.6  limestone
11     AP-2  95.7  limestone
12     AP-2  95.8  limestone
13     AP-2  95.9  limestone
14     AP-2  96.0  limestone
15     AP-2  96.1  limestone
16     AP-2  96.2  limestone
17     AP-2  96.3  limestone
18     AP-2  96.4  limestone
19     AP-2  96.5  limestone
20     AP-2  96.6  limestone
21     AP-2  96.7  limestone

Here's the code I've used and it works, but when I make a loop in pandas I wonder if I'm missing something obvious. pd.DataFrame.resample() is enticing, but I can't figure out how to make it work with non-time data.

import pandas as pd
import numpy as np

liths = pd.DataFrame(
{'borehole': {0: 'AP-2',
  1: 'AP-2',
  2: 'AP-2',
  3: 'AP-2',
  4: 'AP-2',
  5: 'AP-3',
  6: 'AP-3',
  7: 'AP-3',
  8: 'AP-3'},
 'bottom': {0: 95.099999999999994,
  1: 96.700000000000003,
  2: 97.0,
  3: 97.5,
  4: 97.799999999999997,
  5: 87.700000000000003,
  6: 88.099999999999994,
  7: 88.5,
  8: 89.099999999999994},
 'lithology': {0: 'dolomite',
  1: 'limestone',
  2: 'dolomite',
  3: 'limestone',
  4: 'limestone',
  5: 'limestone',
  6: 'limestone',
  7: 'dolomite',
  8: 'limestone'},
 'top': {0: 94.599999999999994,
  1: 95.099999999999994,
  2: 96.700000000000003,
  3: 97.0,
  4: 97.5,
  5: 87.400000000000006,
  6: 87.700000000000003,
  7: 88.099999999999994,
  8: 88.5}}
)

filled = []
increment = 0.1
for row in liths.itertuples():
    start = row.top
    end = row.bottom
    for i in np.arange(start, end, increment):
        filled.append([row.borehole, i, row.lithology])
filled = pd.DataFrame(filled, columns=['borehole', 'Top', 'Lith']); filled

Upvotes: 1

Views: 251

Answers (1)

piRSquared
piRSquared

Reputation: 294318

I'd construct a new dataframe from each row and concatenate them all

def expand(r):
    a = np.arange(r.top, r.bottom, .1)
    n = len(a)
    return pd.DataFrame(dict(
            borehole=[r.borehole] * n,
            lithology=[r.lithology] * n,
            top=a
        ))

pd.concat([expand(r) for r in df.itertuples()], ignore_index=True)

   borehole  lithology   top
0      AP-2   dolomite  94.6
1      AP-2   dolomite  94.7
2      AP-2   dolomite  94.8
3      AP-2   dolomite  94.9
4      AP-2   dolomite  95.0
5      AP-2  limestone  95.1
6      AP-2  limestone  95.2
7      AP-2  limestone  95.3
8      AP-2  limestone  95.4
9      AP-2  limestone  95.5
10     AP-2  limestone  95.6
11     AP-2  limestone  95.7
12     AP-2  limestone  95.8
13     AP-2  limestone  95.9
14     AP-2  limestone  96.0
15     AP-2  limestone  96.1
16     AP-2  limestone  96.2
17     AP-2  limestone  96.3
18     AP-2  limestone  96.4
19     AP-2  limestone  96.5
20     AP-2  limestone  96.6
21     AP-2  limestone  96.7
22     AP-2   dolomite  96.7
23     AP-2   dolomite  96.8
24     AP-2   dolomite  96.9
25     AP-2  limestone  97.0
26     AP-2  limestone  97.1
27     AP-2  limestone  97.2
28     AP-2  limestone  97.3
29     AP-2  limestone  97.4
30     AP-2  limestone  97.5
31     AP-2  limestone  97.6
32     AP-2  limestone  97.7
33     AP-3  limestone  87.4
34     AP-3  limestone  87.5
35     AP-3  limestone  87.6
36     AP-3  limestone  87.7
37     AP-3  limestone  87.8
38     AP-3  limestone  87.9
39     AP-3  limestone  88.0
40     AP-3   dolomite  88.1
41     AP-3   dolomite  88.2
42     AP-3   dolomite  88.3
43     AP-3   dolomite  88.4
44     AP-3   dolomite  88.5
45     AP-3  limestone  88.5
46     AP-3  limestone  88.6
47     AP-3  limestone  88.7
48     AP-3  limestone  88.8
49     AP-3  limestone  88.9
50     AP-3  limestone  89.0

Upvotes: 2

Related Questions