Raul Gonzales
Raul Gonzales

Reputation: 906

R not producing the same result when the data set source is changed

if i manually create 2 DFs then the code does what it was intended to do:

`df1 <- structure(list(CompanyName = c("Google", "Tesco")), .Names = "CompanyName", class = "data.frame", row.names = c(NA, -2L))

df2 <- structure(list(CompanyVariationsNames = c("google plc", "tesco bank","tesco insurance", "google finance", "google play")), .Names =  "CompanyVariationsNames", class = "data.frame", row.names = c(NA, -5L))-5L))
 `

test <- df2 %>% rowwise() %>% mutate(CompanyName = as.character(Filter(length, lapply(df1$CompanyName, function(x) x[grepl(x, CompanyVariationsNames, ignore.case=T)])))) %>% group_by(CompanyName) %>% summarise(Variation = paste(CompanyVariationsNames, collapse=",")) %>% cSplit("Variation", ",")

this produces the following result:

CompanyName Variation_1 Variation_2 Variation_3 1: Google google plc google finance google play 2: Tesco tesco bank tesco insurance NA

but..... if i import a data set (using read.csv)then i get the following error Error in mutate_impl(.data, dots) : Column CompanyName must be length 1 (the group size), not 0. my data sets are rather large so df1 would have 1000 rows and df2 will have 54k rows. is there a specific reason why the code works when the data set is manually created and it does not when data is imported?

the DF1 contains company names and DF2 contains variation names of those companies

help please!

Upvotes: 1

Views: 63

Answers (1)

LFB
LFB

Reputation: 686

Importing from CSV can be tricky. See if the default separator (comma) applies to your file in particular. If not, you can change it by setting the sep argument to a character that works. (E.g.: read.csv(file_path, sep = ";") which is a commom problem in my country due to our local conventions.

In fact, if your standard is semicolons, read.csv2(file_path) will suffice.

And also (to avoid further trouble) it is very commom for csv to mess with columns with decimal values, because here we use commas as decimal separators rather then dots. So, it would be worth checking if this is a problem in your file too, in any of the other columns.

If that is your case, you can set the adequate parameter in either read.csv or read.csv2 by setting dec = "," (E.g.: read.csv(file_path, sep = ";", dec = ","))

Upvotes: 1

Related Questions