user3618451
user3618451

Reputation: 325

Summing across columns with match on dates

I have a dataframe df1 with columns ID and date (as a string "XYYYYMMDD")

ID <- c(101,101,101,102,102)
date <- c("X20170101","X20170103","X20170106","X20170102","X20170104")
df1 <- data.table(ID,date)

ID      date
101 X20170101
101 X20170103
101 X20170106
102 X20170102
102 X20170104

and a dataframe df2 with columns ID and many dates as headers

ID <- c(100,101,102,103,104)
X20170101 <- c(1,NA,NA,2,1)
X20170102 <- c(NA,1,1,1,NA)
X20170103<-c(NA,1,NA,2,1)
X20170104 <- c(2,3,NA,2,1)
X20170105 <- c(1,1,NA,1,NA)
X20170106<-c(NA,1,NA,2,1)         
df2=data.table(ID,X20170101,X20170102,X20170103,X20170104,X20170105,X20170106)

ID X20170101 X20170102 X20170103 X20170104 X20170105 X20170106
100         1        NA        NA         2         1        NA
101        NA         1         1         3         1         1
102        NA         1        NA        NA        NA        NA
103         2         1         2         2         1         2
104         1        NA         1         1        NA         1

I would like to add a column count to df1 containing the following: For each ID (e.g 101) and date (e.g. X20170101) in df1, the sum of the corresponding cells in df2 between that date (e.g.X20170101) and the consecutive one in df1 minus one day (e.g. X20170102, not X20170103). That is, the new df1 should look like this:

ID      date  count
101 X20170101     1
101 X20170103     5
101 X20170106     1
102 X20170102     1
102 X20170104    NA

Thank you for your help.

Upvotes: 1

Views: 64

Answers (2)

moodymudskipper
moodymudskipper

Reputation: 47310

There you go!

library(data.table)
library(dplyr)
library(tidyr)

df2 %>% gather(date,val,-ID) %>%
  full_join(df1 %>% mutate(tag=1)) %>%
  arrange(ID) %>%
  replace_na(list(val=0,tag=0)) %>%
  group_by(ID) %>%
  mutate(grp=cumsum(tag)) %>%
  group_by(ID,grp) %>%
  summarize(count = sum(val),date=head(date,1)) %>%
  ungroup %>%
  mutate(count=ifelse(count== 0,NA,count)) %>%
  select(ID,date,count) %>%
  right_join(df1)

# # A tibble: 5 x 3
#      ID      date count
#   <dbl>     <chr> <dbl>
# 1   101 X20170101     1
# 2   101 X20170103     5
# 3   101 X20170106     1
# 4   102 X20170102     1
# 5   102 X20170104    NA

Upvotes: 1

Stephen Witkowski
Stephen Witkowski

Reputation: 374

Using the dplyr package:

library(dplyr)
count <- aggregate(df1$date), by = list(df1$date), FUN = length)
df1 <- merge(df1, count, by = "date", all.x = TRUE)

Let me know if this works!

Upvotes: 1

Related Questions