user3206440
user3206440

Reputation: 5069

adding a row to a data frame in long format

Given a dataframe df like below

text <- "
parameter,car,qtr,val
a,a3,FY18Q1,23
b,a3,FY18Q1,10000
a,a3,FY18Q2,14
b,a3,FY18Q2,12000
a,cla,FY18Q1,15
b,cla,FY18Q1,12000
c,cla,FY18Q1,5.5
a,cla,FY18Q2,26
b,cla,FY18Q2,10000
c,cla,FY18Q2,6.2
"
df <- read.table(textConnection(text), sep = ",", header = TRUE)

I want to add a row with parameter b_diff for each car, qtr combination with val as difference of parameter b for two consecutive qtr. The qtr ascending order is FY18Q1, FY18Q2. For the first qtr which is FY18Q1, the val for b_diff shall be NA as there is no previous qtr.

The expected output is as below.

parameter   car qtr val
a   a3  FY18Q1  23
b   a3  FY18Q1  10000
b_diff  a3  FY18Q1  NA
a   a3  FY18Q2  14
b   a3  FY18Q2  12000
b_diff  a3  FY18Q2  2000
a   cla FY18Q1  15
b   cla FY18Q1  12000
c   cla FY18Q1  5.5
b_diff  cla FY18Q1  NA
a   cla FY18Q2  26
b   cla FY18Q2  10000
c   cla FY18Q2  6.2
b_diff  cla FY18Q2  -2000

How do I go about doing this with dplyr ?

Upvotes: 0

Views: 189

Answers (2)

shadowtalker
shadowtalker

Reputation: 13913

Here is one algorithm:

  1. Reshape the data to "wide" format, so that qtr and car form a unique row index, with the parameter column "spread" into columns
  2. Within each car value, take the 1-period diff of the new parameter_b column
  3. Reshape the data back to "long" format

Equivalent code, using reshape2 and dplyr:

# optional. you could just use `c(NA, diff(x))` below, but this is more general
padded_diff <- function(x, lag = 1L) {
  c(rep.int(NA, lag), diff(x, lag = lag))
}

df %>%
  dcast(car + qtr ~ parameter, value.var = "val") %>%
  mutate(b_diff = padded_diff(b)) %>%
  melt(id.vars = c("car", "qtr"), variable.name = "parameter") %>%
  arrange(car, qtr, parameter)

Here is another algorithm:

  1. Group the data frame by car
  2. Within each group, temporarily filter so that only rows with paramter == "b" are present
  3. Take the 1-period diff of the val column
  4. Remove the filter and ungroup

Equivalent code, using only dplyr, using a temporary table to simulate a "removable" filter:

make_b_diff_within_group <- function(df) {
  tmp <- df %>%
    filter(parameter == "b") %>%
    transmute(
      qtr = qtr,
      val = padded_diff(val),
      parameter = "b_diff")

  bind_rows(df, tmp)
}

df %>%
  group_by(car) %>%
  do(make_b_diff_within_group(.)) %>%
  ungroup() %>%
  arrange(car, qtr, parameter)

This second algorithm could be implemented using several other "split-apply-combine" paradigms, including the tapply or by functions in base R, the ddply function in the plyr package (an ancestor of dplyr by the same author), and the split method from dplyr, as shown in this answer.

Upvotes: 1

www
www

Reputation: 39174

A solution using and . We can create a group ID using group_indices and based on that to split the data frame, summarize the data and then combine them. df5 is the final output.

library(dplyr)
library(purrr)

df2 <- df %>% mutate(GroupID = group_indices(., car, qtr))

df3 <- df2 %>%
  filter(parameter %in% "b") %>%
  group_by(car) %>%
  mutate(val = val - lag(val), parameter = "b_diff") %>%
  ungroup() %>%
  split(f = .$GroupID)

df4 <- df2 %>% split(f = .$GroupID)

df5 <- map2_dfr(df4, df3, bind_rows) %>% select(-GroupID)

df5
#    parameter car    qtr     val
# 1          a  a3 FY18Q1    23.0
# 2          b  a3 FY18Q1 10000.0
# 3     b_diff  a3 FY18Q1      NA
# 4          a  a3 FY18Q2    14.0
# 5          b  a3 FY18Q2 12000.0
# 6     b_diff  a3 FY18Q2  2000.0
# 7          a cla FY18Q1    15.0
# 8          b cla FY18Q1 12000.0
# 9          c cla FY18Q1     5.5
# 10    b_diff cla FY18Q1      NA
# 11         a cla FY18Q2    26.0
# 12         b cla FY18Q2 10000.0
# 13         c cla FY18Q2     6.2
# 14    b_diff cla FY18Q2 -2000.0

DATA

Notice that it is better to have stringsAsFactors = FALSE.

text <- "
parameter,car,qtr,val
a,a3,FY18Q1,23
b,a3,FY18Q1,10000
a,a3,FY18Q2,14
b,a3,FY18Q2,12000
a,cla,FY18Q1,15
b,cla,FY18Q1,12000
c,cla,FY18Q1,5.5
a,cla,FY18Q2,26
b,cla,FY18Q2,10000
c,cla,FY18Q2,6.2
"
df <- read.table(textConnection(text), sep = ",", header = TRUE, stringsAsFactors = FALSE)

Upvotes: 2

Related Questions