Elis
Elis

Reputation: 54

Nested list "addition"

I have a tibble

df = data.frame(col1 = c("A", "B", "C", "A", "A", "B"), col2 = c(0.2, 0.2, 0.6, 1, 0.8, 0.2), id = c(1, 1, 1, 2, 3, 3)) %>% group_by(id) %>% summarise(col1 = list(col1), col2 = list(col2))

which looks like the following

| col1      | col2            | id |
|-----------|-----------------|----|
| [A, B, C] | [0.2, 0.2, 0.6] | 1  |
| [A]       | [1]             | 2  |
| [A, B, D] | [0.8, 0.1, 0.1] | 3  |

and some parameters

col1_to_add <- c("A", "C", "D")
col2_to_add <- c(0.1, 0.1, 0.1)
rel_ids <- c(2, 3)

and I want to do a kind of nested list "addition" in rows rel_ids, where I increase the values from col2 "corresponding" to A, C and D by the values col2_to_add. More precisely, what I want to do with this data is the following:

In each row where df$id is contained in rel_ids (in this case, in rows 2 and 3)...

  1. add elements from col1_to_add to col1 if they are not already there, e.g.
| col1            | col2            | id |
|-----------------|-----------------|----|
| [A, B, C]       | [0.2, 0.2, 0.6] | 1  | <- unchanged
| [A, C, D]       | [1]             | 2  | <- [C, D] added to col1
| [A, B, C, D]    | [0.8, 0.1, 0.1] | 3  | <- [C] added to col1
  1. increment the values from col2 in the relevant positions
| col1            | col2                 | id |
|-----------------|----------------------|----|
| [A, B, C]       | [0.2, 0.2, 0.6]      | 1  | <- unchanged
| [A, C, D]       | [1.1, 0.1, 0.1]      | 2  | <- A increases by 0.1, C/D gain new 0.1 entries
| [A, B, C, D]    | [0.9, 0.1, 0.1, 0.2] | 3  | <- A/D increase by 0.1, B unchanged, C gains new 0.1 entry

I feel comfortable with the first step, however I am not really sure where to start with the second step - I was wondering whether there in efficient way to do this kind of nested list addition (ideally within a Dplyr pipe) without having to store a lot of indices, etc.

Upvotes: 3

Views: 71

Answers (1)

akrun
akrun

Reputation: 886948

We could create another dataset with 'rel_ids', 'col1_to_add', 'col2_add', then do a join by the 'id', 'col1' after unnesting the list columns in 'df', get the rowSums of the 'col2' columns in transmute and if needed, create list columns again by 'id'

library(dplyr)
library(tidyr)

keydat <- crossing(id = rel_ids, col1 = col1_to_add, col2 = col2_to_add)
out <- df %>% 
   unnest(where(is.list)) %>%
   full_join(keydat, by = c("id", "col1")) %>%
   transmute(id, col1, col2 = rowSums(across(starts_with("col2")), 
      na.rm = TRUE)) %>% 
   arrange(id) %>% 
   group_by(id) %>%
   summarise(across(everything(), list), .groups = 'drop')

-output

> out
# A tibble: 3 × 3
     id col1      col2     
  <dbl> <list>    <list>   
1     1 <chr [3]> <dbl [3]>
2     2 <chr [3]> <dbl [3]>
3     3 <chr [4]> <dbl [4]>
> out$col1
[[1]]
[1] "A" "B" "C"

[[2]]
[1] "A" "C" "D"

[[3]]
[1] "A" "B" "C" "D"

> out$col2
[[1]]
[1] 0.2 0.2 0.6

[[2]]
[1] 1.1 0.1 0.1

[[3]]
[1] 0.9 0.2 0.1 0.1

Upvotes: 1

Related Questions