Reputation: 31

R: Add interpolated values in between columns of dataframe?

I have a data frame that looks like this

Region      2000    2001   2002    2003    2004      2005
Australia   15.6    18.4   19.2    20.2    39.1      50.2
Norway      19.05   20.2   15.3    10      10.1      5.6

and basically I need a quick way to add extra columns in-between the currently existing columns that contain interpolated values of the surrounding columns.

Think of it like this: say you don't want columns for every year, but rather columns for every quarter. Then, for every pair of years (like 2000 and 2001), we would need to add 3 extra columns in-between these years.

The values of these columns will just be interpolated values. So, for Australia, the value in 2000 is 15.6 and in 2001 it is 18.4. So we calculate (18.4 - 15.6)/4 = 0.7, and then the values should now be 15.6, 16.3, 17, 17.7, and finally 18.4.

I have a working solution that builds up the new dataframe from scratch using a for loop. It is EXTREMELY slow. How to speed this up?

Upvotes: 2

Answers (3)

Ronak Shah

Reputation: 389325

Here is one way with tidyverse:

library(tidyverse)

df %>%
  #get data in long format
  pivot_longer(cols = -Region) %>%
  #group by Region
  group_by(Region) %>%
  #Create 4 number sequence between every 2 value
  summarise(temp = list(unlist(map2(value[-n()], value[-1], seq, length.out = 4)))) %>%
  #Get data in long format
  unnest(temp) %>%
  group_by(Region) %>%
  #Create column name
  mutate(col = paste0(rep(names(df)[-c(1, ncol(df))], each = 4), "Q", 1:4)) %>%
  #Spread data in wide format
  pivot_wider(names_from = col, values_from = temp)

# A tibble: 2 x 21
# Groups:   Region [2]
#  Region `2000Q1` `2000Q2` `2000Q3` `2000Q4` `2001Q1` `2001Q2` `2001Q3` `2001Q4` `2002Q1`
#  <fct>     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#1 Austr…     15.6     16.5     17.5     18.4     18.4     18.7     18.9     19.2     19.2
#2 Norway     19.0     19.4     19.8     20.2     20.2     18.6     16.9     15.3     15.3
# … with 11 more variables: `2002Q2` <dbl>, `2002Q3` <dbl>, `2002Q4` <dbl>,
#   `2003Q1` <dbl>, `2003Q2` <dbl>, `2003Q3` <dbl>, `2003Q4` <dbl>, `2004Q1` <dbl>,
#   `2004Q2` <dbl>, `2004Q3` <dbl>, `2004Q4` <dbl>

data

df <- structure(list(Region = structure(1:2, .Label = c("Australia", 
"Norway"), class = "factor"), `2000` = c(15.6, 19.05), `2001` = c(18.4, 
20.2), `2002` = c(19.2, 15.3), `2003` = c(20.2, 10), `2004` = c(39.1, 
10.1), `2005` = c(50.2, 5.6)), class = "data.frame", row.names = c(NA, -2L))

Upvotes: 1

Santanu

Reputation: 382

This is how I did it when I had a similar problem. Not the most sophisticated solution but it works.

Australia=c(  15.6,  18.4,  19.2,  20.2,   39.1,     50.2)

library(zoo)
midpoints=rollmean(Australia, 2)
biyearly=c(rbind(Australia,midpoints))
midpoints=rollmean(biyearly, 2)
quarterly=c(rbind(biyearly,midpoints))
quarterly
#[1] 15.600 16.300 17.000 17.700 18.400 18.600 18.800 19.000 19.200 19.450 19.700
#[12] 19.950 20.200 24.925 29.650 34.375 39.100 41.875 44.650 47.425 50.200 33.600
#[23] 17.000 16.300

Upvotes: 1

mnist

Reputation: 6954

Here is a solution using dplyr. Should be more consistent and much faster than a loop:

# dummy data
df <- tibble(Region = LETTERS[1:5],
             `2000` = 1:5,
             `2001` = 3:7,
             `2002` = 10:14)

# function to calculate quarterly values
into_quarter <- function(x) x / 4

df %>% 
  # create new variables that contain quarterly values
  mutate_at(vars(starts_with("200")), 
            .funs = list("Q1" = into_quarter,
                         "Q2" = into_quarter,
                         "Q3" = into_quarter,
                         "Q4" = into_quarter)) %>% 
  # sort them approriatly.
  # can also be done with base R and order(names), depending on desired result
  select(Region, 
         starts_with("2000"),
         starts_with("2001"),
         starts_with("2002"),
         # in case there are also other variables and to not loose any information
         everything())

Upvotes: 0

R: Add interpolated values in between columns of dataframe?

Answers (3)

Related Questions