Surbhi Jain
Surbhi Jain

Reputation: 150

Connecting local to OCI Object Storage in pyspark

I am trying to read files from OCI Object Storage in local(notebook) but getting error.

WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: oci://bucketv@namespace/20241124001206--file.parquet
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "oci"

code used

from pyspark.sql import SparkSession
from pyspark import SparkConf
conf = SparkConf()
oci_hdfs_jar_path = "/Users/home/oci-hdfs-connector-3.3.4.1.4.2.jar"
conf.set("spark.jars", oci_hdfs_jar_path)
conf.set("spark.hadoop.fs.oci.client.auth.tenantId", config["tenancy"])
conf.set("spark.hadoop.fs.oci.client.auth.userId", config["user"])
conf.set("spark.hadoop.fs.oci.client.auth.fingerprint", config["fingerprint"])
conf.set("spark.hadoop.fs.oci.client.auth.privateKeyFile", config["key_file"])
conf.set("spark.hadoop.fs.oci.client.auth.region", config["region"])
conf.set("spark.hadoop.fs.oci.impl", "oracle.hadoop.fs.oci.OCIFileSystem")
#conf.set("fs.oci.client.hostname", "https://objectstorage.{0}.oraclecloud.com".format(config["region"]))
#conf.set("fs.oci.client.apache.connection.closing.strategy", "immediate")
spark = SparkSession.builder.appName('test').config(conf=conf).getOrCreate()

new_files=[]
bucket_name = 'bucket'
namespace='namespace'
file_name='20241124001206--file.parquet'
new_files.append(f"oci://{bucket_name}@{namespace}/{file_name}")
df1 = spark.read.parquet(*new_files)
df1.show()

pyspark version - 3.4.1

Upvotes: 0

Views: 58

Answers (0)

Related Questions