Yogesh Kumar
Yogesh Kumar

Reputation: 659

How to rename columns in Sparklyr in R?

This is the code I have used in R via Spark Cluster, and error also given below

mydata<-spark_read_csv(spark_cluster,name = "rd_1",path = "IAF_Extracted_Data_Zipped.csv",header = F,delimiter = "|")

mydata %>% select(customer=V1,device_subscriber_id=V2,user_subscriber_id=V3,user_id=V4,location_id=V5) 

Error in .f(.x[[i]], ...) : object 'V1' not found

Upvotes: 1

Views: 1029

Answers (3)

Alper t. Turker
Alper t. Turker

Reputation: 35249

If you want specific names just provide a vector of names on read:

columns <- c("customer", "device_subscriber_id", 
             "user_subscriber_id", "user_id", "location_id")

spark_read_csv(
   spark_cluster, name = "rd_1",path = "IAF_Extracted_Data_Zipped.csv",
   header = FALSE, columns = columns, delimiter = "|"
)

The number of columns should match the number of columns in the input.

Upvotes: 0

Pasqui
Pasqui

Reputation: 621

The renaming convention goes the other way around (new name = old name)

You are looking for the following:

mydata %>% 
    select(V1 = customer,
           V2 = device_subscriber_id,
           V3 = user_subscriber_id,
           V4 = user_id,
           V5 = location_id) 

Upvotes: 2

niko
niko

Reputation: 5281

Of the top of my head you could try customer = mydata$V1 and similar for the other variables (assuming V1,... are column names of mydata).

Upvotes: 0

Related Questions