Reputation: 1
I'm quite new to R and using lapply. I have a large dataframe and I'm attempting to use lapply to output the sum of some subsets of this dataframe.
group_a | group_b | n_variants_a | n_variants_b |
---|---|---|---|
1 | NA | 1 | 2 |
NA | 2 | 5 | 4 |
1 | 2 | 2 | 0 |
I want to look at subsets based on multiple different groups (group_a, group_b) and sum each column of n_variants.
Running this over just one group and n_variant set works:
sum(subset(df, (!is.na(group_a)))$n_variants_a
However I want to sum every n_variant column based on every grouping. My lapply script for this outputs values of 0 for each sum.
summed_variants <- lapply(list_of_groups, function(g) {
lapply(list_of_variants, function(v) {
sum(subset(df, !(is.na(g)))$v)
I was wondering if I need to use paste0 to paste the list of variants in, but I couldn't get this to work.
Thanks for your help!
Upvotes: 0
Views: 514
Reputation: 887118
We may use Map/mapply
for this - loop over the group names, and its corresponding 'n_variants' (assuming they are in order), extract the columns based on the names, apply the condition (!is.na
), subset the 'n_variants' and get the sum
mapply(function(x, y) sum(df1[[y]][!is.na(df1[[x]])]),
names(df1)[1:2], names(df1)[3:4])
group_a group_b
3 4
Or another option can be done using tidyverse
. Loop across
the 'n_variants' columns, get the column name (cur_column()
) replace the substring with 'group', get
the value, create the condition to subset the column and get the sum
library(stringr)
library(dplyr)
df1 %>%
summarise(across(contains('variants'),
~ sum(.x[!is.na(get(str_replace(cur_column(), 'n_variants', 'group')))])))
-output
n_variants_a n_variants_b
1 3 4
df1 <- structure(list(group_a = c(1L, NA, 1L), group_b = c(NA, 2L, 2L
), n_variants_a = c(1L, 5L, 2L), n_variants_b = c(2L, 4L, 0L)),
class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 1