Reputation: 197
I can do the following using sapply, but I wanted if dplyr can do it as well, or if it's a problem outside the scope of dplyr.
I have three columns, one with dates (Date), one with date intervals (Intervals), and the third with numeric values (Values).
What I want to do is for each row:
1) Get the value Date
2) See inside which intervals Date is present (Date %within% Intervals), and produce the corresponding boolean vector
3) Use this boolean vector to subset the entire original dataframe
4) Take the mean of the Values for this dataframe
5) Store the value in a new column called mean_for_Date
For the following input the values for the column mean_for_Date would be 3 and 1:
library(lubridate)
library(dplyr)
df <- data.frame( Date = as.Date(c("1998-01-02", "1998-01-06")), Values = c(1, 3))
df$Intervals <- c(interval(df[2, "Date"] - days(2), df[2, "Date"] + days(2)), interval(df[1, "Date"] - days(2), df[1, "Date"] + days(2)))
In sapply I did it like this:
df$mean_for_Date <- sapply(df$Date, function (x) mean(df$Values[x %within% df$Intervals], na.rm = T))
Thanks in advance for all possible advice
Upvotes: 2
Views: 154
Reputation: 28411
dplyr::mutate
and purrr::map_dbl
in the tidyverse
can do the same
library(lubridate) # date
library(magrittr) # %>% and %<>%
library(tidyverse) # mutate and map_dbl
df <- data.frame( Date = as.Date(c("1998-01-02", "1998-01-06")), Values = c(1, 3))
df$Intervals <- c(interval(df[2, "Date"] - days(2), df[2, "Date"] + days(2)),
interval(df[1, "Date"] - days(2), df[1, "Date"] + days(2)))
df %<>%
mutate(mean_for_Date = map_dbl(Date, ~ mean(Values[. %within% Intervals], na.rm = TRUE)))
df
#> Date Values Intervals mean_for_Date
#> 1 1998-01-02 1 1998-01-04 UTC--1998-01-08 UTC 3
#> 2 1998-01-06 3 1997-12-31 UTC--1998-01-04 UTC 1
str(df)
#> 'data.frame': 2 obs. of 4 variables:
#> $ Date : Date, format: "1998-01-02" "1998-01-06"
#> $ Values : num 1 3
#> $ Intervals :Formal class 'Interval' [package "lubridate"] with 3 slots
#> .. ..@ .Data: num 345600 345600
#> .. ..@ start: POSIXct, format: "1998-01-04" ...
#> .. ..@ tzone: chr "UTC"
#> $ mean_for_Date: num 3 1
Created on 2018-03-10 by the reprex package (v0.2.0).
Upvotes: 2
Reputation: 685
Maybe I'm seeing the problem too simplistically? Would this get the mean(as a function of date) given the dates fall in the interval?
df <- df %>%
left_join(df %>%
filter(Date %within% Intervals & !is.na(Values)) %>%
group_by(Date) %>%
summarise(mean_for_Date = mean(Values)) %>%
select(Date,mean_for_Date)), by="Date")
Upvotes: 0
Reputation: 20095
One solution can be achieved by using a dummy
column to apply cartesian join
between each row of data frame df
to self-join it with itsefl.
I have taken away few not required columns while preparing to join. The cartesian join provides all possible combinations of Date
and Intervals
. I have kept the Values
along with Interval
so that when Date
is matched then corresponding Values
can be taken.
IsFound
decides the rows in which Date
is within
Intervals. We need to filter
only those rows with IsFound
true because Values
from those will derive mean
for that date.
The solution will work if a Date
is found within multiple Interval
. The group_by
will finally help to find mean
.
library(dplyr)
df %>% select(-Values, - Intervals) %>% mutate(dummy = 1) %>%
inner_join(select(df, -Date) %>% mutate(dummy = 1), by="dummy") %>%
mutate(IsFound = Date %within% Intervals) %>%
filter(IsFound) %>%
select(-dummy, - IsFound) %>%
group_by(Date) %>%
summarise(Mean = mean(Values))
# A tibble: 2 x 2
# Date Mean
# <date> <dbl>
#1 1998-01-02 3.00
#2 1998-01-06 1.00
Upvotes: 2