Reputation: 51
I have two data frames:
dfset.seed(10)
df <- data.frame(Name = c("Bob","John","Jane","John","Bob","Jane","Jane"),
Date=as.Date(c("2014-06-04", "2013-12-04", "2013-11-04" , "2013-12-06" ,
"2014-01-09", "2014-03-21", "2014-09-24")), Degrees= rnorm(7, mean=32, sd=32))
Name | Date | Degrees
Bob | 2014-06-04 | 50.599877
John | 2013-12-04 | 44.103919
Jane | 2013-11-04 | 6.117422
John | 2013-12-06 | 30.826633
Bob | 2014-01-09 | 59.425444
Jane | 2014-03-21 | 62.473418
Jane | 2014-09-24 | 11.341562
df2
df2 <- data.frame(Name = c("Bob","John","Jane"),
Date=as.Date(c("2014-03-01", "2014-01-20", "2014-06-07")),
Weather = c("Good weather","Bad weather", "Good weather"))
Name | Date | Weather
Bob | 2014-03-01 | Good weather
John | 2014-01-20 | Bad weather
Jane | 2014-06-07 | Good weather
I would like to extract the following:
Name | Date | Weather | Degrees (until this Date) | Other measures
Bob | 2014-03-01 | Good weather | 59.425444 | 50.599877
John | 2014-01-20 | Bad weather | 44.103919, 30.826633 |
Jane | 2014-06-07 | Good weather | 6.117422, 62.473418 | 11.341562
Which is a merge between both df and df2, with:
Upvotes: 0
Views: 147
Reputation: 13122
Another alternative:
#a grouping variable to use for identical splitting
nms = unique(c(as.character(df$Name), as.character(df2$Name)))
#split data
dates = split(df$Date, factor(df$Name, nms))
degrees = split(df$Degrees, factor(df$Name, nms))
thresholds = split(df2$Date, factor(df2$Name, nms))
#mapply the condition
res = do.call(rbind.data.frame,
Map(function(date, thres, deg)
tapply(deg, factor(date <= thres, c(TRUE, FALSE)),
paste0, collapse = ", "),
dates, thresholds, degrees))
#bind with df2
cbind(df2, setNames(res[match(row.names(res), df2$Name), ], c("Degrees", "Other")))
# Name Date Weather Degrees Other
#Bob Bob 2014-03-01 Good weather 41.4254440501603 32.5998774701384
#John John 2014-01-20 Bad weather 26.10391865379, 12.826633094921 <NA>
#Jane Jane 2014-06-07 Good weather -11.8825775975204, 44.4734176224054 -6.65843761374357
Upvotes: 1
Reputation: 54247
Here's one approach:
library(dplyr)
library(tidyr)
library(magrittr)
res <-
left_join(df, df2 %>% select(Name, Date, Weather), by = "Name") %>%
mutate(paste = factor(Date.x <= Date.y, labels = c("before", "other"))) %>%
group_by(Name, paste) %>%
mutate(Degrees = paste(Degrees, collapse = ", ")) %>%
distinct() %>%
spread(paste, Degrees) %>%
group_by(Name, Date.y, Weather) %>%
summarise(other = other[1], before = before[2]) %>%
set_names(c("Name", "Date" , "Weather", "Degrees (until this Date)" , "Other measures"))
res[is.na(res)] <- ""
res
# Name Date Weather Degrees (until this Date) Other measures
# 1 Bob 2014-03-01 Good weather 41.4254440501603 32.5998774701384
# 2 Jane 2014-06-07 Good weather -11.8825775975204, 44.4734176224054 -6.65843761374357
# 3 John 2014-01-20 Bad weather 26.10391865379, 12.826633094921
There may be room for improvements, but anyway.
Upvotes: 1