Reputation: 54
I have a tibble
df = data.frame(col1 = c("A", "B", "C", "A", "A", "B"), col2 = c(0.2, 0.2, 0.6, 1, 0.8, 0.2), id = c(1, 1, 1, 2, 3, 3)) %>% group_by(id) %>% summarise(col1 = list(col1), col2 = list(col2))
which looks like the following
| col1 | col2 | id |
|-----------|-----------------|----|
| [A, B, C] | [0.2, 0.2, 0.6] | 1 |
| [A] | [1] | 2 |
| [A, B, D] | [0.8, 0.1, 0.1] | 3 |
and some parameters
col1_to_add <- c("A", "C", "D")
col2_to_add <- c(0.1, 0.1, 0.1)
rel_ids <- c(2, 3)
and I want to do a kind of nested list "addition" in rows rel_ids
, where I increase the values from col2
"corresponding" to A, C and D by the values col2_to_add
. More precisely, what I want to do with this data is the following:
In each row where df$id
is contained in rel_ids
(in this case, in rows 2 and 3)...
col1_to_add
to col1
if they are not already there, e.g.| col1 | col2 | id |
|-----------------|-----------------|----|
| [A, B, C] | [0.2, 0.2, 0.6] | 1 | <- unchanged
| [A, C, D] | [1] | 2 | <- [C, D] added to col1
| [A, B, C, D] | [0.8, 0.1, 0.1] | 3 | <- [C] added to col1
col2
in the relevant positions| col1 | col2 | id |
|-----------------|----------------------|----|
| [A, B, C] | [0.2, 0.2, 0.6] | 1 | <- unchanged
| [A, C, D] | [1.1, 0.1, 0.1] | 2 | <- A increases by 0.1, C/D gain new 0.1 entries
| [A, B, C, D] | [0.9, 0.1, 0.1, 0.2] | 3 | <- A/D increase by 0.1, B unchanged, C gains new 0.1 entry
I feel comfortable with the first step, however I am not really sure where to start with the second step - I was wondering whether there in efficient way to do this kind of nested list addition (ideally within a Dplyr pipe) without having to store a lot of indices, etc.
Upvotes: 3
Views: 71
Reputation: 886948
We could create another dataset with 'rel_ids', 'col1_to_add', 'col2_add', then do a join by
the 'id', 'col1' after unnest
ing the list
columns in 'df', get the rowSums
of the 'col2' columns in transmute
and if needed, create list columns again by 'id'
library(dplyr)
library(tidyr)
keydat <- crossing(id = rel_ids, col1 = col1_to_add, col2 = col2_to_add)
out <- df %>%
unnest(where(is.list)) %>%
full_join(keydat, by = c("id", "col1")) %>%
transmute(id, col1, col2 = rowSums(across(starts_with("col2")),
na.rm = TRUE)) %>%
arrange(id) %>%
group_by(id) %>%
summarise(across(everything(), list), .groups = 'drop')
-output
> out
# A tibble: 3 × 3
id col1 col2
<dbl> <list> <list>
1 1 <chr [3]> <dbl [3]>
2 2 <chr [3]> <dbl [3]>
3 3 <chr [4]> <dbl [4]>
> out$col1
[[1]]
[1] "A" "B" "C"
[[2]]
[1] "A" "C" "D"
[[3]]
[1] "A" "B" "C" "D"
> out$col2
[[1]]
[1] 0.2 0.2 0.6
[[2]]
[1] 1.1 0.1 0.1
[[3]]
[1] 0.9 0.2 0.1 0.1
Upvotes: 1