Arun Gunalan
Arun Gunalan

Reputation: 824

How to read the csv and convert to RDD in sparkR

As i am a R programmer i want to use R as a interface to spark, with the sparkR package i installed sparkR in R.

I'm new to sparkR. I want to perform some operations on particular data in a CSV record. I'm trying to read a csv file and convert it to rdd.

This is the code i did:
sc <- sparkR.init(master="local") # created spark content
data <- read.csv(sc, "/home/data1.csv")
#It throws an error, to use read.table

Data i have to load and convert - https://i.sstatic.net/sj78x.png

if am wrong, how to read this data in csv and convert to RDD in sparkR

TIA

Upvotes: 0

Views: 789

Answers (3)

maxmithun
maxmithun

Reputation: 1137

This below code will let you read a csv with header . All the best

val csvrdd = spark.read.options(“header”,”true”).csv(filename)

Upvotes: 0

user8947768
user8947768

Reputation: 1

In the recent SparkR version (2.0+)

read.df(path, source = "csv")

In Spark 1.x

read.df(sc, path, source = "com.databricks.spark.csv")

with

spark.jars.packages  com.databricks:spark-csv_2.10:1.4.0

Upvotes: 0

Alvaro Agea
Alvaro Agea

Reputation: 470

I believe that the problem is the header line, if you remove this line, it should work.

How do I convert csv file to rdd

--edited--

With this code you can test Sparkr with CSVs, but you need to remove the header line in your CSV file.

lines <- textFile(sc, "/home/data1.csv") 
csvElements <- lapply(lines, function(line) { 
#line represent each CSV line i. e. strsplit(line, ",") is useful 
})

Upvotes: 1

Related Questions