Reputation: 2017
This is my dataframe:
dput(test)
structure(list(Branch = c("11 Oktomvri", "11 Oktomvri", "11 Oktomvri",
"11 Oktomvri", "11 Oktomvri", "11 Oktomvri", "11 Oktomvri", "Aerodrom",
"Aerodrom", "Aerodrom", "Aerodrom", "Aerodrom", "Aerodrom", "Aerodrom",
"Aerodrom 2", "Aerodrom 2", "Aerodrom 2", "Aerodrom 2", "Aerodrom 2",
"Aerodrom 2", "Aerodrom 2", "Bitola", "Bitola", "Bitola", "Bitola",
"Bitola", "Bitola", "Bitola"), period = c("January", "February",
"March", "April", "May", "June", "July", "January", "February",
"March", "April", "May", "June", "July", "January", "February",
"March", "April", "May", "June", "July", "January", "February",
"March", "April", "May", "June", "July"), value = c(1513, 1511,
1520, 1524, 1508, 1504, 1517, 1364, 1381, 1400, 1403, 1401, 1406,
1430, 674, 687, 689, 690, 696, 705, 715, 4400, 4393, 4365, 4342,
4345, 4373, 4389)), .Names = c("Branch", "period", "value"), row.names = c(NA,
-28L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), vars = c("Branch",
"period"), drop = TRUE, indices = list(3L, 1L, 0L, 6L, 5L, 2L,
4L, 10L, 8L, 7L, 13L, 12L, 9L, 11L, 17L, 15L, 14L, 20L, 19L,
16L, 18L, 24L, 22L, 21L, 27L, 26L, 23L, 25L), group_sizes = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), biggest_group_size = 1L, labels = structure(list(
Branch = c("11 Oktomvri", "11 Oktomvri", "11 Oktomvri", "11 Oktomvri",
"11 Oktomvri", "11 Oktomvri", "11 Oktomvri", "Aerodrom",
"Aerodrom", "Aerodrom", "Aerodrom", "Aerodrom", "Aerodrom",
"Aerodrom", "Aerodrom 2", "Aerodrom 2", "Aerodrom 2", "Aerodrom 2",
"Aerodrom 2", "Aerodrom 2", "Aerodrom 2", "Bitola", "Bitola",
"Bitola", "Bitola", "Bitola", "Bitola", "Bitola"), period = c("April",
"February", "January", "July", "June", "March", "May", "April",
"February", "January", "July", "June", "March", "May", "April",
"February", "January", "July", "June", "March", "May", "April",
"February", "January", "July", "June", "March", "May")), row.names = c(NA,
-28L), class = "data.frame", vars = c("Branch", "period"), drop = TRUE, .Names = c("Branch",
"period")))
I'm not sure how to group the data based on Branch and Period, and calculate value based on period_n - period_n-1.
The output should be:
city period value diff_n_1
Bitola March 4365 -28
Bitola April 2000 13
My attepmt:
results <- sample2 %>%
group_by(Branch, period) %>%
arrange(Branch) %>%
mutate(lagged_period = lag(value), client_diff = value - lagged_period)
I'm not sure how to pull-off the last line.
Any ideas?
Upvotes: 1
Views: 35
Reputation: 5689
I think you had it nearly right, but you don't need the Period
in group_by
, since that really is the variable we are using lag
on. Currently, your above work is resulting in exclusively NAs
because there is only one element in each defined group (so no lag is available).
This should work:
library(dplyr)
sample2 %>%
group_by(Branch) %>%
arrange(Branch) %>%
mutate(lagged_period = lag(value),
client_diff = value - lagged_period)
Then, of course, if you wanted to remove the NAs
from the calculated field, you can just pipe the above into:
filter(!is.na(client_diff))
Upvotes: 2