newinPython
newinPython

Reputation: 313

Py4JJavaError: An error occurred while calling o389.csv

I'm new to pyspark. I'm running pyspark using databricks. My data is stored in Azure Data Lake Service.I'm trying to read csv file from ADLS to pyspark data frame. So I wrote following code

import pyspark
from pyspark import SparkContext 
from pyspark import SparkFiles

df = sqlContext.read.csv(SparkFiles.get("dbfs:mycsv path in ADSL/Data.csv"), 
   header=True, inferSchema= True)

But I'm getting error message

Py4JJavaError: An error occurred while calling o389.csv.

Can you suggest me to rectify this error?

Upvotes: 3

Views: 4015

Answers (1)

Alex Ott
Alex Ott

Reputation: 87154

The SparkFiles class is intended for accessing the files shipped as part of the Spark job. If you just need access to the CSV file available on ADLS, then you just need to use spark.read.csv, like:

df = spark.read.csv("dbfs:mycsv path in ADSL/Data.csv", 
  header=True, inferSchema=True)

it's better not to use sqlContext, it's kept for compatibility reasons.

Upvotes: 1

Related Questions