tbuda
tbuda

Reputation: 43

Bluemix Apache Spark Service - Scala - reading a file

This is a basic question, however, I'm trying to retrieve the contents of a file using the code below in Scala in a Bluemix notebook on Analytics from Apache Spark Service, and errors regarding authentication keep popping up. Does someone have an example for Scala authentication for accessing a file? Thank you in advance!

I tried the following simple script:

val file = sc.textFile("swift://notebooks.keystone/kdd99.data")
file.take(1)

I also tried:

def setConfig(name:String) : Unit = {
  val pfx = "fs.swift.service." + name
  val conf = sc.getConf
  conf.set(pfx + "auth.url", "hardcoded")
  conf.set(pfx + "tenant", "hardcoded")
  conf.set(pfx + "username", "hardcoded")
  conf.set(pfx + "password", "hardcoded")
  conf.set(pfx + "apikey",  "hardcoded")
  conf.set(pfx + "auth.endpoint.prefix", "endpoints")
}
setConfig("keystone")

I also tried this script from a previous question:

import scala.collection.breakOut
val name= "keystone"
val YOUR_DATASOURCE = """auth_url:https://identity.open.softlayer.com
project: hardcoded
project_id: hardcoded
region: hardcoded
user_id: hardcoded
domain_id: hardcoded
domain_name: hardcoded
username: hardcoded
password: hardcoded
filename: hardcoded
container: hardcoded
tenantId: hardcoded
"""

val settings:Map[String,String] = YOUR_DATASOURCE.split("\\n").
    map(l=>(l.split(":",2)(0).trim(), l.split(":",2)(1).trim()))(breakOut)

val conf = sc.getConf        conf.set("fs.swift.service.keystone.auth.url",settings.getOrElse("auth_url",""))
conf.set("fs.swift.service.keystone.tenant", settings.getOrElse("tenantId", ""))
conf.set("fs.swift.service.keystone.username", settings.getOrElse("username", ""))
conf.set("fs.swift.service.keystone.password", settings.getOrElse("password", ""))
conf.set("fs.swift.service.keystone.apikey", settings.getOrElse("password", ""))
conf.set("fs.swift.service.keystone.auth.endpoint.prefix", "endpoints")
println("sett: "+ settings.getOrElse("auth_url","")) 
val file = sc.textFile("swift://notebooks.keystone/kdd99.data")

/* The following line gives errors */
file.take(1)

The error is below:

Name: org.apache.hadoop.fs.swift.exceptions.SwiftConfigurationException Message: Missing mandatory configuration option: fs.swift.service.keystone.auth.url

Edit

This would be a good alternative for Python. I tried the below, with "spark" as configname for two different files:

def set_hadoop_config(credentials):
    prefix = "fs.swift.service." + credentials['name'] 
    hconf = sc._jsc.hadoopConfiguration()
    hconf.set(prefix + ".auth.url", credentials['auth_url']+'/v3/auth/tokens')
    hconf.set(prefix + ".auth.endpoint.prefix", "endpoints")
    hconf.set(prefix + ".tenant", credentials['project_id'])
    hconf.set(prefix + ".username", credentials['user_id'])
    hconf.set(prefix + ".password", credentials['password'])
    hconf.setInt(prefix + ".http.port", 8080)
    hconf.set(prefix + ".region", credentials['region'])
    hconf.setBoolean(prefix + ".public", True)

Upvotes: 3

Views: 572

Answers (2)

NSHUKLA
NSHUKLA

Reputation: 81

To access a file from Object Store in Scala, the following sequence of commands works in a Scala notebook : (Credentials is populated in the cell when you do "Insert to code" link for the file shown in Data Source of the notebook):

IN[1]:

var credentials = scala.collection.mutable.HashMap[String, String](
  "auth_url"->"https://identity.open.softlayer.com",
  "project"->"object_storage_b3c0834b_0936_4bbe_9f29_ef45e018cec9",
  "project_id"->"68d053dff02e42b1a947457c6e2e3290",
  "region"->"dallas",
  "user_id"->"e7639268215e4830a3662f708e8c4a5c",
  "domain_id"->"2df6373c549e49f8973fb6d22ab18c1a",
  "domain_name"->"639347",
  "username"->"Admin_XXXXXXXXXXXX”,
  "password”->”””XXXXXXXXXX”””,
  "filename"->"2015_small.csv",
  "container"->"notebooks",
  "tenantId"->"sefe-f831d4ccd6da1f-42a9cf195d79"
)

IN[2]:

credentials("name")="keystone"

IN[3]:

def setHadoopConfig(name: String, tenant: String, url: String, username: String, password: String, region: String) = {
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.auth.url",url+"/v3/auth/tokens")
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.auth.endpoint.prefix","endpoints")
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.tenant",tenant)
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.username",username)
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.password",password)
    sc.hadoopConfiguration.setInt(f"fs.swift.service.$name.http.port",8080)
    sc.hadoopConfiguration.set(f"fs.swift.service.$name.region",region)
    sc.hadoopConfiguration.setBoolean(f"fs.swift.service.$name.public",true)
}

IN[4]:

setHadoopConfig(credentials("name"), credentials("project_id"), credentials("auth_url"), credentials("user_id"), credentials("password"), credentials("region"))

IN[5]:

var testcount = sc.textFile("swift://notebooks.keystone/2015_small.csv")
testcount.count()

IN [6]:

testcount.take(1)

Upvotes: 2

charles gomes
charles gomes

Reputation: 2155

I think you would need to use "spark" as configname instead of keystone as you are trying to access object storage from IBM Bluemix Notebook UI.

sc.textFile("swift://notebooks.spark/2015_small.csv”)

Now Here is an example of working sample.

https://console.ng.bluemix.net/data/notebooks/4dda9ee7-bf26-4ebc-bccf-dcb1b7ef63c8/view?access_token=37bff7ab682ee255b753fca485d49de50fed69d2a25217a7c748dd1463222c3b

Note consider changing the container name based on your object storage. containername.configname.

Also replace your credentials in the YOUR_DATASOURCE variable in above example.

Notebooks is a default container.

Thanks, Charles.

Upvotes: 3

Related Questions