Reputation: 824
As i am a R programmer i want to use R as a interface to spark, with the sparkR package i installed sparkR in R.
I'm new to sparkR. I want to perform some operations on particular data in a CSV record. I'm trying to read a csv file and convert it to rdd.
This is the code i did:
sc <- sparkR.init(master="local") # created spark content
data <- read.csv(sc, "/home/data1.csv")
#It throws an error, to use read.table
Data i have to load and convert - https://i.sstatic.net/sj78x.png
if am wrong, how to read this data in csv and convert to RDD in sparkR
TIA
Upvotes: 0
Views: 789
Reputation: 1137
This below code will let you read a csv with header . All the best
val csvrdd = spark.read.options(“header”,”true”).csv(filename)
Upvotes: 0
Reputation: 1
In the recent SparkR version (2.0+)
read.df(path, source = "csv")
In Spark 1.x
read.df(sc, path, source = "com.databricks.spark.csv")
with
spark.jars.packages com.databricks:spark-csv_2.10:1.4.0
Upvotes: 0
Reputation: 470
I believe that the problem is the header line, if you remove this line, it should work.
How do I convert csv file to rdd
--edited--
With this code you can test Sparkr with CSVs, but you need to remove the header line in your CSV file.
lines <- textFile(sc, "/home/data1.csv")
csvElements <- lapply(lines, function(line) {
#line represent each CSV line i. e. strsplit(line, ",") is useful
})
Upvotes: 1