Raul Gonzales
Raul Gonzales

Reputation: 906

Transpose partialialy Matched data from row to columns

I have a master column that looks like this:

CompanyName Google Tesco

I also have another data frame that looks like this:

CompanyVariationsNames google plc tesco bank tesco insurance google finance google play

I need the data to look like this:

Company Name Variation1 Variation2 Variation3 Google google plc google finance google play Tesco tesco bank tesco insurance

this is just a sample as i have 750 company names that have returned about 5000 company name variations. i have managed to get all the matching customer variations in one column with the below code as the CompanyVariationsNames column comes from a pool of of over 100k company names:

matched1 = subset(DF, grepl(paste(Set1, collapse = "|"), DF$Customer_Name, ignore.case = T )) but i cant find a way to make them look like the above mentioned result. any advice would be grateful!

Upvotes: 1

Views: 61

Answers (1)

Prem
Prem

Reputation: 11965

One option could be

library(dplyr)
library(splitstackshape)

df2 %>%
  rowwise() %>%
  mutate(CompanyName = ifelse(is.null(as.character(Filter(length, 
                                           lapply(df1$CompanyName, function(x) x[grepl(x, CompanyVariationsNames, ignore.case=T)])))),
                              NA,
                              as.character(Filter(length, 
                                                  lapply(df1$CompanyName, function(x) x[grepl(x, CompanyVariationsNames, ignore.case=T)]))))) %>%
  filter(!is.na(CompanyName)) %>%
  group_by(CompanyName) %>%
  summarise(Variation = paste(CompanyVariationsNames, collapse=",")) %>%
  cSplit("Variation", ",")

Output is:

   CompanyName Variation_1     Variation_2 Variation_3
1:      Google  google plc  google finance google play
2:       Tesco  tesco bank tesco insurance          NA

Sample data:

df1 <- structure(list(CompanyName = c("Google", "Tesco")), .Names = "CompanyName", class = "data.frame", row.names = c(NA, 
-2L))

df2 <- structure(list(CompanyVariationsNames = c("google plc", "tesco bank", 
"tesco insurance", "google finance", "google play")), .Names = "CompanyVariationsNames", class = "data.frame", row.names = c(NA, 
-5L))

Update: Added logic to handle below error

Error in mutate_impl(.data, dots) : Column "CompanyName" must be length 1 (the group size), not 0

Upvotes: 1

Related Questions