Reputation: 906
if i manually create 2 DFs then the code does what it was intended to do:
`df1 <- structure(list(CompanyName = c("Google", "Tesco")), .Names = "CompanyName", class = "data.frame", row.names = c(NA, -2L))
df2 <- structure(list(CompanyVariationsNames = c("google plc", "tesco bank","tesco insurance", "google finance", "google play")), .Names = "CompanyVariationsNames", class = "data.frame", row.names = c(NA, -5L))-5L))
`
test <- df2 %>%
rowwise() %>%
mutate(CompanyName = as.character(Filter(length,
lapply(df1$CompanyName, function(x) x[grepl(x, CompanyVariationsNames, ignore.case=T)])))) %>%
group_by(CompanyName) %>%
summarise(Variation = paste(CompanyVariationsNames, collapse=",")) %>%
cSplit("Variation", ",")
this produces the following result:
CompanyName Variation_1 Variation_2 Variation_3
1: Google google plc google finance google play
2: Tesco tesco bank tesco insurance NA
but..... if i import a data set (using read.csv)then i get the following error Error in mutate_impl(.data, dots) : Column CompanyName must be length 1 (the group size), not 0
. my data sets are rather large so df1
would have 1000 rows and df2
will have 54k rows.
is there a specific reason why the code works when the data set is manually created and it does not when data is imported?
the DF1 contains company names and DF2 contains variation names of those companies
help please!
Upvotes: 1
Views: 63
Reputation: 686
Importing from CSV can be tricky. See if the default separator (comma) applies to your file in particular. If not, you can change it by setting the sep
argument to a character that works. (E.g.: read.csv(file_path, sep = ";")
which is a commom problem in my country due to our local conventions.
In fact, if your standard is semicolons, read.csv2(file_path)
will suffice.
And also (to avoid further trouble) it is very commom for csv to mess with columns with decimal values, because here we use commas as decimal separators rather then dots. So, it would be worth checking if this is a problem in your file too, in any of the other columns.
If that is your case, you can set the adequate parameter in either read.csv
or read.csv2
by setting dec = ","
(E.g.: read.csv(file_path, sep = ";", dec = ",")
)
Upvotes: 1