Reputation: 906
I have a master column that looks like this:
CompanyName
Google
Tesco
I also have another data frame that looks like this:
CompanyVariationsNames
google plc
tesco bank
tesco insurance
google finance
google play
I need the data to look like this:
Company Name Variation1 Variation2 Variation3
Google google plc google finance google play
Tesco tesco bank tesco insurance
this is just a sample as i have 750 company names that have returned about 5000 company name variations.
i have managed to get all the matching customer variations in one column with the below code as the CompanyVariationsNames
column comes from a pool of of over 100k company names:
matched1 = subset(DF, grepl(paste(Set1, collapse = "|"), DF$Customer_Name, ignore.case = T ))
but i cant find a way to make them look like the above mentioned result.
any advice would be grateful!
Upvotes: 1
Views: 61
Reputation: 11965
One option could be
library(dplyr)
library(splitstackshape)
df2 %>%
rowwise() %>%
mutate(CompanyName = ifelse(is.null(as.character(Filter(length,
lapply(df1$CompanyName, function(x) x[grepl(x, CompanyVariationsNames, ignore.case=T)])))),
NA,
as.character(Filter(length,
lapply(df1$CompanyName, function(x) x[grepl(x, CompanyVariationsNames, ignore.case=T)]))))) %>%
filter(!is.na(CompanyName)) %>%
group_by(CompanyName) %>%
summarise(Variation = paste(CompanyVariationsNames, collapse=",")) %>%
cSplit("Variation", ",")
Output is:
CompanyName Variation_1 Variation_2 Variation_3
1: Google google plc google finance google play
2: Tesco tesco bank tesco insurance NA
Sample data:
df1 <- structure(list(CompanyName = c("Google", "Tesco")), .Names = "CompanyName", class = "data.frame", row.names = c(NA,
-2L))
df2 <- structure(list(CompanyVariationsNames = c("google plc", "tesco bank",
"tesco insurance", "google finance", "google play")), .Names = "CompanyVariationsNames", class = "data.frame", row.names = c(NA,
-5L))
Update: Added logic to handle below error
Error in mutate_impl(.data, dots) : Column "CompanyName" must be length 1 (the group size), not 0
Upvotes: 1