Reputation: 150
I am trying to read files from OCI Object Storage in local(notebook) but getting error.
WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: oci://bucketv@namespace/20241124001206--file.parquet
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "oci"
code used
from pyspark.sql import SparkSession
from pyspark import SparkConf
conf = SparkConf()
oci_hdfs_jar_path = "/Users/home/oci-hdfs-connector-3.3.4.1.4.2.jar"
conf.set("spark.jars", oci_hdfs_jar_path)
conf.set("spark.hadoop.fs.oci.client.auth.tenantId", config["tenancy"])
conf.set("spark.hadoop.fs.oci.client.auth.userId", config["user"])
conf.set("spark.hadoop.fs.oci.client.auth.fingerprint", config["fingerprint"])
conf.set("spark.hadoop.fs.oci.client.auth.privateKeyFile", config["key_file"])
conf.set("spark.hadoop.fs.oci.client.auth.region", config["region"])
conf.set("spark.hadoop.fs.oci.impl", "oracle.hadoop.fs.oci.OCIFileSystem")
#conf.set("fs.oci.client.hostname", "https://objectstorage.{0}.oraclecloud.com".format(config["region"]))
#conf.set("fs.oci.client.apache.connection.closing.strategy", "immediate")
spark = SparkSession.builder.appName('test').config(conf=conf).getOrCreate()
new_files=[]
bucket_name = 'bucket'
namespace='namespace'
file_name='20241124001206--file.parquet'
new_files.append(f"oci://{bucket_name}@{namespace}/{file_name}")
df1 = spark.read.parquet(*new_files)
df1.show()
pyspark version - 3.4.1
Upvotes: 0
Views: 58