Faisal Khan
Faisal Khan

Reputation: 77

How to read csv file from GCS using spark-java?

I am trying to read csv file which is stored in GCS using spark, I have a simple spark java project which does nothing but reading a csv. the following code are used in it.

SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("Hello world");
    SparkSession sparkSession = SparkSession.builder().config(conf).getOrCreate();

    Dataset<Row> dataset = sparkSession.read().option("header", true).option("sep", "" + ",").option("delimiter", "\"").csv("gs://abc/WDC_age.csv");

but it throws an error which says:

Exception in thread "main" java.io.IOException: No FileSystem for scheme: gs

can anyone help me in this? I just want to read csv from GCS using spark.

Thanks In Advance :)

Upvotes: 1

Views: 745

Answers (2)

Faisal Khan
Faisal Khan

Reputation: 77

In my case, i just added the following dependency on my pom.xml file:

<dependency>
        <groupId>com.google.cloud.bigdataoss</groupId>
        <artifactId>gcs-connector</artifactId>
        <version>hadoop3-2.2.4</version>
    </dependency>

and it work for me.

Upvotes: 1

Dagang Wei
Dagang Wei

Reputation: 26458

No FileSystem for scheme: gs indicates Spark couldn't find the GCS connector. I guess you are not running in a Dataproc cluster, you might need to install the connector by yourself https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage

Upvotes: 0

Related Questions