Reputation: 179
I have couple of .csv
files in C:\Users\USER_NAME\Documents
which are more than 2 GB in size. I want to use Apache Spark to read the data out of them in R. I am using Microsoft R Open 3.3.1 with Spark 2.0.1.
I am stuck with reading the .csv
files with the function spark_read_csv(...)
defined in Sparklyr
package. It is asking for a file path which starts with file://
. I want to know the proper file path for my case starting with file://
and ends with the file name which are in .../Documents
directory.
Upvotes: 1
Views: 1012
Reputation: 309
I had a similar problem. In my case it was necessary for the .csv file to be put into the hdfs file system before calling it with spark_read_csv.
I think you probably have a similar problem.
If your cluster is also running with hdfs you need to use:
hdfs dfs -put
Best, Felix
Upvotes: 1