Reputation: 3805
I have a df
set.seed(123)
df <- data.frame(loc.id = rep(1:9, each = 9), month = rep(1:9,times = 9),
x = runif(81, min = 0, max = 5))
This is a dataframe which has 9 locations. For each location, I have 9 months and for each month, there is a value of x.
For each location, I want to select a month based on following criteria:
1) Check which months (excluding month 9) have x > 1 and then select that month which is closest to month 9. For example, if for location 1, values of x is
4.56, 3.41, 0.82, 2.31, 3.75, 4.75, 1.22, 2.98, 1.17
then month 1,2,4,5,6,7,8 have x > 1 and from these months, month 8 is closest to month 9. So month 8 will be selected
2) If none of the months have x > 1, simply select that month which has the highest x value. For example:
If for a location, x is
0.8, 0.6, 0.95, 0.4, 0.88, 0.7, 0.6, 0.45, 0.3
then month 3 will be selected (x = 0.95)
I tried this:
library(dplyr)
df %>% filter(month != 9) %>% # removes the 9 month so that only the 8 months are evaluated
group_by(loc.id) %>%
mutate(select.month = x > 1) %>% # mark those months where x > 1
filter(select.month == TRUE) %>% # select those months where x > 1 is true
mutate(dif = 9 - month) %>%# subtract each month from 9 to check which one is closest to 9
summarise(month.id = min(dif)) # select the months which is closest to month 9
However, in the above function I cannot check for those locations where all the month's have value less than 1. My question is how do I change the above code to also check condition 2 when none of the x is > 1
Upvotes: 1
Views: 1823
Reputation: 67778
library(data.table)
setDT(d)
d[ , {
ix <- x > 1 & month != 9
.(month = if(any(ix)) last(month[ix]) else month[which.max(x)])
}, by = loc.id]
Explanation:
For each group (by = loc.id
), get index where x > 1, excluding month 9 (x > 1 & month != 9
). If any such index is true (if(any(ix))
), select the last month among them (last(month[ix])
). Else select month corresponding to max x (else month[which.max(x)]
).
Upvotes: 1
Reputation: 2091
One potential solution is an ifelse
, if position 8 is >1.0, then position 8, else max for that row, after excluding row 9.
An example for one month:
month1 <- filter(df, loc.id == 1)
month1 <- month1[1:8, ]
df1 <- ifelse(month1[8,3] > 1.0, month1[8,3], max(month1$x))
Upvotes: 0
Reputation: 7724
I modified your dataframe a little bit, as there was no case where none of the loc.id
s had only months with less than 1.
df %>%
group_by(loc.id) %>%
filter(month != 9) %>%
mutate(all_x_less_1 = all(x < 1)) %>%
filter(all_x_less_1 | x > 1) %>%
filter(month == if_else(all_x_less_1, month[which.max(x)], month[which.min(9 - month)]))
# A tibble: 9 x 4
# Groups: loc.id [9]
# loc.id month x all_x_less_1
# <int> <int> <dbl> <lgl>
# 1 1 8 4.46 F
# 2 2 7 2.25 F
# 3 3 8 1.18 F
# 4 4 5 1.13 F
# 5 5 1 0.758 T
# 6 6 5 0.715 T
# 7 7 5 0.639 T
# 8 8 2 0.509 T
# 9 9 1 0.395 T
The trick is to filter not only on x < 1
, but also whether all x
in that loc.id
are less than 1. Then with the if_else
call in the filter
you can specify your criteria based on whether all x
are less than 1.
Data
set.seed(123)
df <- data.frame(loc.id = rep(1:9, each = 9), month = rep(1:9,times = 9),
x = runif(81, min = 0, max = 5))
df <- df %>%
mutate(x = x/loc.id)
Upvotes: 1
Reputation: 79228
set.seed(123)
> df <- data.frame(loc.id = rep(1:9, each = 9), month = rep(1:9,times = 9),
x = runif(81, min = 0, max = 5))
> set.seed(123)
> df=rbind(df,cbind(loc.id=10,month=1:9 , x=runif(9)))
> df%>%group_by(loc.id)%>%mutate(x=replace(x,9,0),y=cumsum(x>1))%>%
+ summarise(y=ifelse(all(!y),which.max(x),which.max(y)))
# A tibble: 10 x 2
loc.id y
<dbl> <int>
1 1 8
2 2 8
3 3 8
4 4 7
5 5 8
6 6 8
7 7 7
8 8 8
9 9 7
10 10 5
Upvotes: 1