Alex
Alex

Reputation: 2780

The dplyr way to get grouped differences

I am trying to figure out the dplyr way to do grouped differences.

Here is some fake data:

>crossing(year=seq(1,4),week=seq(1,3)) %>% 
  mutate(value = c(rep(4,3),rep(3,3),rep(2,3),rep(1,3)))

    year  week value
   <int> <int> <dbl>
 1     1     1     4
 2     1     2     4
 3     1     3     4
 4     2     1     3
 5     2     2     3
 6     2     3     3
 7     3     1     2
 8     3     2     2
 9     3     3     2
10     4     1     1
11     4     2     1
12     4     3     1

What I would like is year 1- year2, year2-year3, and year3-year4. The result would like like the following.

    year  week diffs
   <int> <int> <dbl>
 1     1     1     1
 2     1     2     1
 3     1     3     1
 4     2     1     1
 5     2     2     1
 6     2     3     1
 7     3     1     1
 8     3     2     1
 9     3     3     1

Edit:

I apologize. I was trying to make a simple reprex, but I messed up a lot.

Please let me know what the proper etiquette is. I don't want to ruffle any feathers.

  1. I did not know that -diff() was a function. What I am actually looking for is percent difference ((new-old)/old)*100 and I am not able to find a straight forward way to use diff to get that value.

  2. I am starting from the largest year. Adding a arrange(desc(year)) to the above code is what I have. I would be trimming the smallest year not the largest.

If this edit with worth a separate question let me know.

Upvotes: 0

Views: 55

Answers (3)

moodymudskipper
moodymudskipper

Reputation: 47300

using dplyr and do:

library(dplyr)
df %>% group_by(week) %>% do(cbind(.[-nrow(.),1:2],diffs=-diff(.$value)))

# # A tibble: 9 x 3
# # Groups:   week [3]
#    year  week diffs
#   <int> <int> <dbl>
# 1     1     1     1
# 2     2     1     1
# 3     3     1     1
# 4     1     2     1
# 5     2     2     1
# 6     3     2     1
# 7     1     3     1
# 8     2     3     1
# 9     3     3     1

Upvotes: 0

alistaire
alistaire

Reputation: 43334

You can use diff, but it needs adjusting, as it subtracts the other way and returns a vector that's one shorter than what it's passed:

library(tidyverse)

diffed <- crossing(year = seq(1,4), 
                   week = seq(1,3)) %>% 
    mutate(value = rep(4:1, each = 3)) %>% 
    group_by(week) %>% 
    mutate(value = c(-diff(value), NA)) %>% 
    drop_na(value)

diffed
#> # A tibble: 9 x 3
#> # Groups:   week [3]
#>    year  week value
#>   <int> <int> <int>
#> 1     1     1     1
#> 2     1     2     1
#> 3     1     3     1
#> 4     2     1     1
#> 5     2     2     1
#> 6     2     3     1
#> 7     3     1     1
#> 8     3     2     1
#> 9     3     3     1

Upvotes: 2

akuiper
akuiper

Reputation: 214927

If you don't have missing years for each week:

df %>% 
    arrange(year) %>%
    group_by(week) %>% 
    mutate(diffs = value - lead(value)) %>% 
    na.omit() %>% select(-value)

# A tibble: 9 x 3
# Groups:   week [3]
#   year  week diffs
#  <int> <int> <dbl>
#1     1     1     1
#2     1     2     1
#3     1     3     1
#4     2     1     1
#5     2     2     1
#6     2     3     1
#7     3     1     1
#8     3     2     1
#9     3     3     1

Upvotes: 3

Related Questions