Pree
Pree

Reputation: 77

How to duplicate rows of existing table and change value of one column based on a simple condition

I am a beginner in R, and I would appreciate help from experts for this problem. I have Products table like this:

Table1 : Existing Products

ProductId Width  Thickness
CM72      4.2     12
CM73      1.2     76
CM84      3.5     32
CM93      2.1     31

Table2 : New Products

ProductId    
CG72
CG93

I want to append the ProductId of the NewProducts table to the existing Products table. And I want to copy the Width and Thickness and Length from the similar numbered existing product. CM72 -> is the match for CG72. CM93 -> match for CG93.

Final table should look like this:

ProductId Width  Thickness
CM72      4.2     12
CM73      1.2     76
CM84      3.5     32
CM93      2.1     31
CG72      4.2     12
CG93      2.1     31

Is there an effective way to do this in R? Many thanks!

Upvotes: 1

Views: 1625

Answers (1)

akrun
akrun

Reputation: 887048

We replace the 'G' with 'M' in the 'ProductId' of second dataset ('df2') using sub, get the index of row in 'df1' by matching with 'ProductId', subset the 2nd and 3rd column with only the rows based on the match ids, cbind with the second dataset 'df2' and then rbind with the 'df1'

dfN <- rbind(df1,  cbind(df2, df1[match(sub("G", "M", df2$ProductId), df1$ProductId), 2:3]))
row.names(dfN) <- NULL
dfN
#  ProductId Width Thickness
#1      CM72   4.2        12
#2      CM73   1.2        76
#3      CM84   3.5        32
#4      CM93   2.1        31
#5      CG72   4.2        12
#6      CG93   2.1        31

Or another option is with data.table, We get the datasets in a list, rbind, grouped by the substring of 'ProductId', loop through the other columns, get the non-NA elements and assign (:=) it to the columns

rbindlist(list(df1, df2), fill = TRUE)[, (2:3) := 
 lapply(.SD, function(x) x[!is.na(x)]), .(grp = sub("G", "M", ProductId))][]
#    ProductId Width Thickness
#1:      CM72   4.2        12
#2:      CM73   1.2        76
#3:      CM84   3.5        32
#4:      CM93   2.1        31
#5:      CG72   4.2        12
#6:      CG93   2.1        31

A corresponding tidyverse approach would be

library(tidyverse)
bind_rows(df1, df2) %>% 
     group_by(grp =  sub("G", "M", ProductId)) %>% 
     mutate_at(vars(Width:Thickness), funs(.[!is.na(.)])) %>%
     ungroup() %>%
     select(-grp)
data
df1 <- structure(list(ProductId = c("CM72", "CM73", "CM84", "CM93"), 
Width = c(4.2, 1.2, 3.5, 2.1), Thickness = c(12L, 76L, 32L, 
31L)), .Names = c("ProductId", "Width", "Thickness"),
 class =  "data.frame", row.names = c(NA, -4L))

df2 <- structure(list(ProductId = c("CG72", "CG93")),
.Names = "ProductId", class = "data.frame", row.names = c(NA, 
 -2L))

Upvotes: 1

Related Questions