Reputation: 797
I would like to compare the values inside a grouped data.frame using dplyr, and create a dummy variable, or something similar, indicating which is bigger. Couldn't figure it out!
Here is some reproducible code:
table <- structure(list(species = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Adelophryne adiastola",
"Adelophryne gutturosa"), class = "factor"), scenario = structure(c(3L,
1L, 2L, 3L, 1L, 2L), .Label = c("future1", "future2", "present"
), class = "factor"), amount = c(5L, 3L, 2L, 50L, 60L, 40L)), .Names = c("species",
"scenario", "amount"), class = "data.frame", row.names = c(NA,
-6L))
> table
species scenario amount
1 Adelophryne adiastola present 5
2 Adelophryne adiastola future1 3
3 Adelophryne adiastola future2 2
4 Adelophryne gutturosa present 50
5 Adelophryne gutturosa future1 60
6 Adelophryne gutturosa future2 40
I would group the df by species
.
I want to create a new column, can be increase_amount
, where the amount in every "future" is compared to the "present". I could get 1 when the value has increased and 0 when it has decreased.
I have been trying with a for loop that goes throw each of the species, but the df contains over 50,000 of them and it takes too long for the times I will have to re-do the operation...
Someone know a way? Thanks a lot!
Upvotes: 3
Views: 6466
Reputation: 1114
It sounds like you could use lag()
to quickly find the difference over time. I would suggest restructuring your scenario
(time) variable so that it can be intuitively reordered using R functions (i.e., arrange()
will alphabetically reorder your scenario
variable to future1, future2, present, which won't work in this case).
df <- data.frame(species=rep(letters,3),
scenario=rep(1:3,26),
amount=runif(78))
summary(df)
glimpse(df)
df %>% count(species,scenario)
df %>%
arrange(species,scenario) %>% # arrange scenario by ascending order
group_by(species) %>%
mutate(diff1=amount-lag(amount), # calculate difference from time 1 -> 2, and time 2 -> 3
diff2=amount-lag(amount,2)) # calculate difference from time 1 -> 3
The output from lag()
will result in NA
's for the first scenario
values within each grouping, but the results can be easily changed using ifelse()
statements or filter()
.
df %>%
arrange(species,scenario) %>% group_by(species) %>%
mutate(diff1=amount-lag(amount)) %>%
filter(diff1>0)
df %>%
arrange(species,scenario) %>% group_by(species) %>%
mutate(diff1=amount-lag(amount)) %>%
mutate(diff.incr=ifelse(diff1>0,'increase','no increase'))
Upvotes: 0
Reputation: 887991
We can do this with ave
from base R
table$increase_amount <- with(table, as.integer(amount > ave(amount *
(scenario == "present"), species, FUN = function(x) x[x!=0])))
table$increase_amount
#[1] 0 0 0 0 1 0
Upvotes: 0
Reputation: 5714
You can do something like that:
table %>%
group_by(species) %>%
mutate(tmp = amount[scenario == "present"]) %>%
mutate(increase_amount = ifelse(amount > tmp, 1, 0))
# Source: local data frame [6 x 5]
# Groups: species [2]
#
# species scenario amount tmp increase_amount
# <fctr> <fctr> <int> <int> <dbl>
# 1 Adelophryne adiastola present 5 5 0
# 2 Adelophryne adiastola future1 3 5 0
# 3 Adelophryne adiastola future2 2 5 0
# 4 Adelophryne gutturosa present 50 50 0
# 5 Adelophryne gutturosa future1 60 50 1
# 6 Adelophryne gutturosa future2 40 50 0
Upvotes: 5