flightless13wings
flightless13wings

Reputation: 109

Merge 2 columns in R

I have a data set with columns I'd like to merge similar to this:

library(data.table)
DF <- as.data.table(list(ID = c(1,2,3,4,5), Product = c('Y', NA, NA, 'Z', NA), Type = c(NA, 'D', 'G', NA, NA)))

DF
ID   Product Type
1       Y     NA
2       NA    D
3       NA    G
4       Z     NA
5       NA    NA

which I would like to look like this:

DF
ID   Product Type  Category
1       Y     NA      Y
2       NA    D       D
3       NA    G       G
4       Z     NA      Z
5       NA    NA      NA

My Code is:

  DF[,Category := na.omit(c(Product,Type)), by = ID][,c("Product","Type"):=NULL]

The problem that I have is that I would like to have for the Category to be NA when both Product and Type are NAs. Also, I don't know if my code works because my data set has over 200,000 rows.

Upvotes: 1

Views: 122

Answers (2)

akrun
akrun

Reputation: 887951

We can do this in two assignments and avoid ifelse as assignment in place (:=) is faster and efficient.

DF[, Category := Product][is.na(Product), Category := Type][]
#      ID Product Type Category
#1:  1       Y   NA        Y
#2:  2      NA    D        D
#3:  3      NA    G        G
#4:  4       Z   NA        Z
#5:  5      NA   NA       NA

Or if we assume that there will be only a maximum 1 non-NA value per row for Product/Type, then pmax can be used.

DF[, Category := pmax(Product, Type, na.rm = TRUE)][]
#    ID Product Type Category
#1:  1       Y   NA        Y
#2:  2      NA    D        D
#3:  3      NA    G        G
#4:  4       Z   NA        Z
#5:  5      NA   NA       NA

Benchmarks

DF1 <- DF[rep(1:nrow(DF), 1e6)]
DF2 <- copy(DF1)
DF3 <- copy(DF1)
system.time(DF1[, Category := Product][is.na(Product), Category := Type])
#  user  system elapsed 
#  0.16    0.06    0.17 
system.time(DF2[ , Category := ifelse(is.na(Product), Type, Product)])
#  user  system elapsed 
#  1.35    0.19    1.53 

system.time(DF3[ ,Category := pmax(Product, Type, na.rm = TRUE)])
#  user  system elapsed 
#  0.04    0.02    0.06 

EDIT: Updated with the benchmarks and it clearly shows both the methods mentioned in my post are efficient.

Upvotes: 4

Sumedh
Sumedh

Reputation: 4965

DF[ , Category := ifelse(is.na(Product), Type, Product)]

#   ID Product Type Category
#1:  1       Y   NA        Y
#2:  2      NA    D        D
#3:  3      NA    G        G
#4:  4       Z   NA        Z
#5:  5      NA   NA       NA

This is assuming if there are values for both Product and Type, you want Product in Category

Upvotes: 5

Related Questions