rjb
rjb

Reputation: 55

R update a tibble using data from a second tibble as row and column

I have an empty tibble full of NA's with the rows and columns, respectively named for id numbers and all the dates in a period of time. For example, this code:

tbl <- tibble(PERSONAL_ID = c("A", "B", "C", "D"))
dates = as.character((seq(as.Date("2016-01-01"), as.Date("2016-01-05"), by="days")))
tbl[dates] <- NA
tbl <- column_to_rownames(tbl, var = "PERSONAL_ID")

I have a second tibble that contains columns matching up one ID number with one date, as in this example:

enrollments <- tibble(ID = c("D", "B", "C", "D"),
                      date = c("2016-01-01", "2016-01-03", "2016-01-05", "2016-01-02"))

What I would like to do is add "1" to the row and column of the first tibble (tbl) corresponding to the ID and date listed in the second tibble (enrollments). For the example code above, the desired output would be:

  2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05
A       <NA>       <NA>       <NA>        <NA>      <NA>
B       <NA>       <NA>          1        <NA>      <NA>
C       <NA>       <NA>       <NA>        <NA>         1
D          1          1       <NA>        <NA>      <NA>

Thank you!

Upvotes: 2

Views: 748

Answers (3)

langtang
langtang

Reputation: 24722

Here's a data.table approach, using PERSONAL_IDS, dates, and enrollments

dcast(rbind(
  rbindlist(lapply(setdiff(PERSONAL_ID,enrollments$ID), \(x) data.table(ID=x, date=dates)))[,value:=NA],
  enrollments[,value:=1]
  ), ID~date,value.var="value")

Output:

   ID 2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05
1:  A         NA         NA         NA         NA         NA
2:  B         NA         NA          1         NA         NA
3:  C         NA         NA         NA         NA          1
4:  D          1          1         NA         NA         NA

Upvotes: 0

AndrewGB
AndrewGB

Reputation: 16856

Here is a base R approach. For enrollments, I combine the date and ID into one string for each row. Then, for tbl, I create a table that is filled with the date and rowname. Then, I use match to match the values from enrollment with those in tbl. I use arrayInd to get the column and row index. Finally, I replace the NA values in tbl with 1 for the extracted column-row indices.

replace(tbl, arrayInd(match(do.call(
  paste, subset(enrollments, select = c("date", "ID"))),
t(outer(colnames(tbl), rownames(tbl), FUN = paste)
)), .dim = dim(tbl)), 1)

Here is a slightly different tidyverse approach by using rows_update. Here, I pivot enrollments to a wide format, then bind to tbl (but with no rows) in order to have the same columns. Then, I update the rows in tbl with the new format from enrollments.

library(tidyverse)

rows_update(tbl %>% rownames_to_column("ID"), enrollments %>% 
              mutate(value = 1) %>% 
              pivot_wider(names_from = "date", values_from = "value") %>% 
              bind_rows(tbl[0,])) %>% 
  column_to_rownames("ID")

Output

  2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05
A         NA         NA         NA         NA         NA
B         NA         NA          1         NA         NA
C         NA         NA         NA         NA          1
D          1          1         NA         NA         NA

Upvotes: 2

benson23
benson23

Reputation: 19097

Here's a tidyverse approach.

  1. First change your tbl from a wide format to a long format so that it matches the format of enrollments.
  2. Create a Count column in enrollments, and every row would be Count = 1.
  3. Then left_join the transformed tbl with enrollments using ID and date as the joining field.
  4. Finally, transform the long format back to a wide format and set rownames.
library(tidyverse)

left_join(tbl %>% rownames_to_column(var = "ID") %>% 
            pivot_longer(-ID, names_to = "date", values_to = "Count") %>% 
            select(-Count), 
          enrollments %>% mutate(Count = 1), 
          by = c("ID", "date")) %>%
  pivot_wider(names_from = "date", values_from = "Count") %>%
  column_to_rownames(var = "ID")

Output

 2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05
A         NA         NA         NA         NA         NA
B         NA         NA          1         NA         NA
C         NA         NA         NA         NA          1
D          1          1         NA         NA         NA

Your dataset as reference

tbl
  2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05
A         NA         NA         NA         NA         NA
B         NA         NA         NA         NA         NA
C         NA         NA         NA         NA         NA
D         NA         NA         NA         NA         NA
enrollments
# A tibble: 4 x 2
  ID    date      
  <chr> <chr>     
1 D     2016-01-01
2 B     2016-01-03
3 C     2016-01-05
4 D     2016-01-02

Upvotes: 2

Related Questions