Reputation: 415
I have a data frame with some NA values. I need the sum of two of the columns. If a value is NA, I need to treat it as zero.
a b c d
1 2 3 4
5 NA 7 8
Column e should be the sum of b and c:
e
5
7
I have tried a lot of things, and done two dozen searches with no luck. It seems like a simple problem. Any help would be appreciated!
Upvotes: 39
Views: 81523
Reputation: 151
I think the easiest way to deal with it in data.table would be:
library(data.table)
# The data
dt <- data.table(a= c(1,5), b= c(2, NA), c= c(3,7), d= c(4,8))
# Sum of two numeric columns with NAs.
dt[, e := rowSums(.SD, na.rm= T), .SDcols = c("b", "c")]
# a b c d e
# 1: 1 2 3 4 5
# 2: 5 NA 7 8 7
Hope it helps.
Upvotes: 0
Reputation: 18642
rowwise
is really inefficient for even moderately sized data frames. If there is a row-wise variant that will be much faster. For summation this would be rowSums
. You can use pick
wrapped in rowSums
to tidy-select columns you want to sum across:
library(dplyr)
df |>
mutate(e = rowSums(pick(c:d), na.rm = T))
# a b c d e
# 1 1 2 3 4 7
# 2 5 NA 7 8 15
Upvotes: 2
Reputation: 1364
I hope that it may help you
Some cases you have a few columns that are not numeric. This approach will serve you both. Note that: c_across() for dplyr version 1.0.0 and later
df <- data.frame(
TEXT = c("text1", "text2"), a = c(1,5), b = c(2, NA), c = c(3,7), d = c(4,8))
df2 <- df %>%
rowwise() %>%
mutate(e = sum(c_across(a:d), na.rm = TRUE))
# A tibble: 2 x 6
# Rowwise:
# TEXT a b c d e
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 text1 1 2 3 4 10
# 2 text2 5 NA 7 8 20
Upvotes: 3
Reputation: 484
if you want to keep NA if both columns has it you can use:
Data, sample:
dt <- data.table(x = sample(c(NA, 1, 2, 3), 100, replace = T), y = sample(c(NA, 1, 2, 3), 100, replace = T))
Solution:
dt[, z := ifelse(is.na(x) & is.na(y), NA_real_, rowSums(.SD, na.rm = T)), .SDcols = c("x", "y")]
(the data.table way)
Upvotes: 3
Reputation: 3938
dplyr
solution, taken from here:
library(dplyr)
dat %>%
rowwise() %>%
mutate(e = sum(b, c, na.rm = TRUE))
Upvotes: 32
Reputation: 32426
dat$e <- rowSums(dat[,c("b", "c")], na.rm=TRUE)
dat
# a b c d e
# 1 1 2 3 4 5
# 2 5 NA 7 8 7
Upvotes: 55
Reputation: 3278
Here is another solution, with concatenated ifelse()
:
dat$e <- ifelse(is.na(dat$b) & is.na(dat$c), dat$e <-0, ifelse(is.na(dat$b), dat$e <- 0 + dat$c, dat$b + dat$c))
# a b c d e
#1 1 2 3 4 5
#2 5 NA 7 8 7
Edit, here is another solution that uses with
as suggested by @kasterma in the comments, this is much more readable and straightforward:
dat$e <- with(dat, ifelse(is.na(b) & is.na(c ), 0, ifelse(is.na(b), 0 + c, b + c)))
Upvotes: 3