Reputation: 325
I have a dataframe df1 with columns ID and date (as a string "XYYYYMMDD")
ID <- c(101,101,101,102,102)
date <- c("X20170101","X20170103","X20170106","X20170102","X20170104")
df1 <- data.table(ID,date)
ID date
101 X20170101
101 X20170103
101 X20170106
102 X20170102
102 X20170104
and a dataframe df2 with columns ID and many dates as headers
ID <- c(100,101,102,103,104)
X20170101 <- c(1,NA,NA,2,1)
X20170102 <- c(NA,1,1,1,NA)
X20170103<-c(NA,1,NA,2,1)
X20170104 <- c(2,3,NA,2,1)
X20170105 <- c(1,1,NA,1,NA)
X20170106<-c(NA,1,NA,2,1)
df2=data.table(ID,X20170101,X20170102,X20170103,X20170104,X20170105,X20170106)
ID X20170101 X20170102 X20170103 X20170104 X20170105 X20170106
100 1 NA NA 2 1 NA
101 NA 1 1 3 1 1
102 NA 1 NA NA NA NA
103 2 1 2 2 1 2
104 1 NA 1 1 NA 1
I would like to add a column count to df1 containing the following: For each ID (e.g 101) and date (e.g. X20170101) in df1, the sum of the corresponding cells in df2 between that date (e.g.X20170101) and the consecutive one in df1 minus one day (e.g. X20170102, not X20170103). That is, the new df1 should look like this:
ID date count
101 X20170101 1
101 X20170103 5
101 X20170106 1
102 X20170102 1
102 X20170104 NA
Thank you for your help.
Upvotes: 1
Views: 64
Reputation: 47310
There you go!
library(data.table)
library(dplyr)
library(tidyr)
df2 %>% gather(date,val,-ID) %>%
full_join(df1 %>% mutate(tag=1)) %>%
arrange(ID) %>%
replace_na(list(val=0,tag=0)) %>%
group_by(ID) %>%
mutate(grp=cumsum(tag)) %>%
group_by(ID,grp) %>%
summarize(count = sum(val),date=head(date,1)) %>%
ungroup %>%
mutate(count=ifelse(count== 0,NA,count)) %>%
select(ID,date,count) %>%
right_join(df1)
# # A tibble: 5 x 3
# ID date count
# <dbl> <chr> <dbl>
# 1 101 X20170101 1
# 2 101 X20170103 5
# 3 101 X20170106 1
# 4 102 X20170102 1
# 5 102 X20170104 NA
Upvotes: 1
Reputation: 374
Using the dplyr
package:
library(dplyr)
count <- aggregate(df1$date), by = list(df1$date), FUN = length)
df1 <- merge(df1, count, by = "date", all.x = TRUE)
Let me know if this works!
Upvotes: 1