Reputation: 77
I am a beginner in R, and I would appreciate help from experts for this problem. I have Products table like this:
Table1 : Existing Products
ProductId Width Thickness
CM72 4.2 12
CM73 1.2 76
CM84 3.5 32
CM93 2.1 31
Table2 : New Products
ProductId
CG72
CG93
I want to append the ProductId of the NewProducts table to the existing Products table. And I want to copy the Width and Thickness and Length from the similar numbered existing product. CM72 -> is the match for CG72. CM93 -> match for CG93.
Final table should look like this:
ProductId Width Thickness
CM72 4.2 12
CM73 1.2 76
CM84 3.5 32
CM93 2.1 31
CG72 4.2 12
CG93 2.1 31
Is there an effective way to do this in R? Many thanks!
Upvotes: 1
Views: 1625
Reputation: 887048
We replace the 'G' with 'M' in the 'ProductId' of second dataset ('df2') using sub
, get the index of row in 'df1' by match
ing with 'ProductId', subset the 2nd and 3rd column with only the rows based on the match
ids, cbind
with the second dataset 'df2' and then rbind
with the 'df1'
dfN <- rbind(df1, cbind(df2, df1[match(sub("G", "M", df2$ProductId), df1$ProductId), 2:3]))
row.names(dfN) <- NULL
dfN
# ProductId Width Thickness
#1 CM72 4.2 12
#2 CM73 1.2 76
#3 CM84 3.5 32
#4 CM93 2.1 31
#5 CG72 4.2 12
#6 CG93 2.1 31
Or another option is with data.table
, We get the datasets in a list
, rbind
, grouped by the substring of 'ProductId', loop through the other columns, get the non-NA elements and assign (:=
) it to the columns
rbindlist(list(df1, df2), fill = TRUE)[, (2:3) :=
lapply(.SD, function(x) x[!is.na(x)]), .(grp = sub("G", "M", ProductId))][]
# ProductId Width Thickness
#1: CM72 4.2 12
#2: CM73 1.2 76
#3: CM84 3.5 32
#4: CM93 2.1 31
#5: CG72 4.2 12
#6: CG93 2.1 31
A corresponding tidyverse
approach would be
library(tidyverse)
bind_rows(df1, df2) %>%
group_by(grp = sub("G", "M", ProductId)) %>%
mutate_at(vars(Width:Thickness), funs(.[!is.na(.)])) %>%
ungroup() %>%
select(-grp)
data
df1 <- structure(list(ProductId = c("CM72", "CM73", "CM84", "CM93"),
Width = c(4.2, 1.2, 3.5, 2.1), Thickness = c(12L, 76L, 32L,
31L)), .Names = c("ProductId", "Width", "Thickness"),
class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(ProductId = c("CG72", "CG93")),
.Names = "ProductId", class = "data.frame", row.names = c(NA,
-2L))
Upvotes: 1