Reputation: 4398
I have a dataset, z, that I wish to calculate the growth increase by the type:
location size type date
ny 5 hello 10/01/2020
ny 7 ai 10/02/2020
ny 8 ai 10/03/2020
ny 6 hello 10/04/2020
ca 15 cool 10/05/2020
ca 10 name 10/06/2020
ca 5 name 10/07/2020
ca 16 cool 10/08/2020
Desired output
location type increase percent_increase start_date end_date
ca cool 1 6.67% 10/05/2020 10/08/2020
ca name -5 -50% 10/6/2020 10/7/2020
ny hello 1 20% 10/01/2020 10/4/2020
ny ai 1 14.28% 10/2/2020 10/3/2020
This is what I am doing:
library(tidyverse)
z %>%
group_by(type, location) %>%
mutate(percent_increase = (size/lead(size) - 1) * 100)
I am not getting my desired output. Any assistance is appreciated.
Upvotes: 1
Views: 459
Reputation: 988
To get the results you want, you need a different calculation in your mutate
line:
I also added a filter
to remove any results with NA
for the percent_increase
variable.
And finally added ```arrange`` to sort alphabetically by location to match the same order as your requested output.
CODE
z %>% group_by(type, location) %>% mutate( increase = (lead(size) - size), percent_increase = (increase/size) * 100, start_date = date, end_date = lead(date)) %>% filter(!is.na(percent_increase)) %>% arrange(location)
OUTPUT
# A tibble: 4 x 8
# Groups: type, location [4]
location size type date increase percent_increase start_date end_date
<chr> <int> <chr> <chr> <int> <dbl> <chr> <chr>
1 ca 15 cool 10/05/2020 1 6.67 10/05/2020 10/08/2020
2 ca 10 name 10/06/2020 -5 -50 10/06/2020 10/07/2020
3 ny 5 hello 10/01/2020 1 20 10/01/2020 10/04/2020
4 ny 7 ai 10/02/2020 1 14.3 10/02/2020 10/03/2020
INPUT
z <- structure(list(location = c("ny", "ny", "ny", "ny", "ca", "ca",
"ca", "ca"), size = c(5L, 7L, 8L, 6L, 15L, 10L, 5L, 16L), type = c("hello",
"ai", "ai", "hello", "cool", "name", "name", "cool"), date = c("10/01/2020",
"10/02/2020", "10/03/2020", "10/04/2020", "10/05/2020", "10/06/2020",
"10/07/2020", "10/08/2020")), class = "data.frame", row.names = c(NA,
-8L))
Upvotes: 3
Reputation:
you're missing arrange function to organize by date
like this:
z %>%
group_by(type, location) %>%
arrange(date) %>%
mutate(percent_increase = (size/lead(size) - 1) * 100)
Upvotes: 2