Reputation: 659
This is the code I have used in R via Spark Cluster, and error also given below
mydata<-spark_read_csv(spark_cluster,name = "rd_1",path = "IAF_Extracted_Data_Zipped.csv",header = F,delimiter = "|")
mydata %>% select(customer=V1,device_subscriber_id=V2,user_subscriber_id=V3,user_id=V4,location_id=V5)
Error in .f(.x[[i]], ...) : object 'V1' not found
Upvotes: 1
Views: 1029
Reputation: 35249
If you want specific names just provide a vector of names on read:
columns <- c("customer", "device_subscriber_id",
"user_subscriber_id", "user_id", "location_id")
spark_read_csv(
spark_cluster, name = "rd_1",path = "IAF_Extracted_Data_Zipped.csv",
header = FALSE, columns = columns, delimiter = "|"
)
The number of columns
should match the number of columns in the input.
Upvotes: 0
Reputation: 621
The renaming convention goes the other way around (new name
= old name
)
You are looking for the following:
mydata %>%
select(V1 = customer,
V2 = device_subscriber_id,
V3 = user_subscriber_id,
V4 = user_id,
V5 = location_id)
Upvotes: 2
Reputation: 5281
Of the top of my head you could try customer = mydata$V1
and similar for the other variables (assuming V1,...
are column names of mydata
).
Upvotes: 0