Jordan Ford
Jordan Ford

Reputation: 81

Upsample large dataset

Essentially I'm looking to upsample to fill in missing hours between forecast times.

I have a dataset that looks like this:

  case                              Regions        forecastTime WindSpeed_low
1    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-01 09:00:00            35
2    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-01 12:00:00            25
3    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-03 03:00:00            25
4   27 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-05 09:00:00            15
5   27 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-05 16:00:00            00
  WindSpeed_high  poly_id
1             45 fea1-289
2             NA fea1-289
3             NA fea1-289
4             20 fea1-289
5             NA fea1-289

Each issued forecast has a case number, an associated region and forecast time.

My goal is to expand the forecast times for each case to include all hours between the times the forecast changed:

  case                              Regions        forecastTime WindSpeed_low
1    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-01 09:00:00            35
2    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-01 10:00:00            35
3    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-01 11:00:00            35
4    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-01 12:00:00            25
5    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-01 13:00:00            25
  WindSpeed_high  poly_id
1             45 fea1-289
2             45 fea1-289
3             45 fea1-289
4             NA fea1-289
5             NA fea1-289

Here the forecast is the same between 2010-01-01 09:00:00 and 2010-01-01 11:59:59, fd$WindSpeed_low == 35 and fd$WindSpeed_high == 45, however at 2010-01-01 12:00:00 the forecast changes to fd$WindSpeed_low == 25 and fd$WindSpeed_high == NA. I was thinking I could group each forecast by case, but I am stuck on how I should go about this expansion correctly. I am relatively new to R.

Upvotes: 0

Views: 124

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388807

You may use complete and fill from tidyr -

library(dplyr)
library(tidyr)

df %>%
  group_by(case, Regions) %>%
  complete(forecastTime = seq(min(forecastTime),max(forecastTime),by='hour')) %>%
  fill(WindSpeed_low, poly_id) %>%
  ungroup

Upvotes: 1

Related Questions