Reputation: 109
I have a data set with columns I'd like to merge similar to this:
library(data.table)
DF <- as.data.table(list(ID = c(1,2,3,4,5), Product = c('Y', NA, NA, 'Z', NA), Type = c(NA, 'D', 'G', NA, NA)))
DF
ID Product Type
1 Y NA
2 NA D
3 NA G
4 Z NA
5 NA NA
which I would like to look like this:
DF
ID Product Type Category
1 Y NA Y
2 NA D D
3 NA G G
4 Z NA Z
5 NA NA NA
My Code is:
DF[,Category := na.omit(c(Product,Type)), by = ID][,c("Product","Type"):=NULL]
The problem that I have is that I would like to have for the Category
to be NA
when both Product
and Type
are NA
s. Also, I don't know if my code works because my data set has over 200,000 rows.
Upvotes: 1
Views: 122
Reputation: 887951
We can do this in two assignments and avoid ifelse
as assignment in place (:=
) is faster and efficient.
DF[, Category := Product][is.na(Product), Category := Type][]
# ID Product Type Category
#1: 1 Y NA Y
#2: 2 NA D D
#3: 3 NA G G
#4: 4 Z NA Z
#5: 5 NA NA NA
Or if we assume that there will be only a maximum 1 non-NA value per row for Product/Type, then pmax
can be used.
DF[, Category := pmax(Product, Type, na.rm = TRUE)][]
# ID Product Type Category
#1: 1 Y NA Y
#2: 2 NA D D
#3: 3 NA G G
#4: 4 Z NA Z
#5: 5 NA NA NA
DF1 <- DF[rep(1:nrow(DF), 1e6)]
DF2 <- copy(DF1)
DF3 <- copy(DF1)
system.time(DF1[, Category := Product][is.na(Product), Category := Type])
# user system elapsed
# 0.16 0.06 0.17
system.time(DF2[ , Category := ifelse(is.na(Product), Type, Product)])
# user system elapsed
# 1.35 0.19 1.53
system.time(DF3[ ,Category := pmax(Product, Type, na.rm = TRUE)])
# user system elapsed
# 0.04 0.02 0.06
EDIT: Updated with the benchmarks and it clearly shows both the methods mentioned in my post are efficient.
Upvotes: 4
Reputation: 4965
DF[ , Category := ifelse(is.na(Product), Type, Product)]
# ID Product Type Category
#1: 1 Y NA Y
#2: 2 NA D D
#3: 3 NA G G
#4: 4 Z NA Z
#5: 5 NA NA NA
This is assuming if there are values for both Product
and Type
, you want Product
in Category
Upvotes: 5