Reputation: 24626
In the notebook, there is an option for inserting code from a object store file. However, when I click on the link it populates a notebook cell with a set of variables. E.g.
auth_url : https://identity.open.softlayer.com
project : object_storage_***
project_id : ****
region : dallas
user_id : *****
domain_id : *****
domain_name : *****
username : user_*****
password : *****
filename : block_1.csv
container : notebooks
tenantId : ****
How do I use this information in a spark command to load the data? Presumably something like this:
scala> val data = sc.textFile( ... )
Question: What is the exact command?
Upvotes: 0
Views: 698
Reputation: 321
I am using a Jupyter notebook with the Spark as a Service app in Bluemix with the Scala 2.10 kernel.
I was able to access a file stored in the Swift Object store using this code below. I think this is method slightly easier since I was able to select the file in the object store in the Jupyter notebook and just use the insert code function to add the code to my notebook without modification. Here is a snippet below
def setConfig(credentials : scala.collection.mutable.HashMap[String, String]) = {
val prefix = "fs.swift.service." + credentials("name")
var hconf = sc.getConf
hconf.set(prefix + ".auth.url", credentials("auth_url")+"/v3/auth/tokens")
hconf.set(prefix + ".auth.endpoint.prefix", "endpoints")
hconf.set(prefix + ".tenant", credentials("project_id"))
hconf.set(prefix + ".username", credentials("user_id"))
hconf.set(prefix + ".password", credentials("password"))
hconf.set(prefix + ".http.port", "8080")
hconf.set(prefix + ".region", credentials("region"))
hconf.set(prefix + ".public", "True")
}
var credentials_1 = scala.collection.mutable.HashMap[String, String](
"auth_url"->"https://identity.open.softlayer.com",
"project"->"objexxxxxxxxxxxxx858",
"project_id"->"f4xxxxxxxxxxxxxxa7",
"region"->"dallas",
"user_id"->"e4fc7294xxxxxxx5",
"domain_id"->"7527xxxxxxxxxxx44f",
"domain_name"->"9xxxxx9",
"username"->"Admin_",
"password"->"""xxxxxxxxxxxxx""",
"filename"->"scores.dat",
"container"->"notebooks",
"tenantId"->"s69xxxxxxxxxxxxx4f0"
)
credentials_1("name") = "spark"
setConfig(credentials_1)
val file = sc.textFile("swift://notebooks." + credentials_1("name") + "/" + credentials_1("filename"))
file.take(5)
Upvotes: 2
Reputation: 2414
The object storage Insert to Code option appears to only dump in a list of preferences. I came up with this small scala helper to extract the properties from that string it dumps in:
import scala.collection.breakOut
val YOUR_DATASOURCE = """<<paste_your_datasource_attributes_here>>"""
def setConfig(name:String, dsConfiguration:String) : Unit = {
val pfx = "fs.swift.service." + name
val settings:Map[String,String] = dsConfiguration.split("\\n").
map(l=>(l.split(":",2)(0).trim(), l.split(":",2)(1).trim()))(breakOut)
val conf = sc.getConf
conf.set(pfx + "auth.url", settings.getOrElse("auth_url",""))
conf.set(pfx + "tenant", settings.getOrElse("tenantId", ""))
conf.set(pfx + "username", settings.getOrElse("username", ""))
conf.set(pfx + "password", settings.getOrElse("password", ""))
conf.set(pfx + "apikey", settings.getOrElse("password", ""))
conf.set(pfx + "auth.endpoint.prefix", "endpoints")
}
setConfig("spark", YOUR_DATASOURCE)
Copy this into the notebook, then put your cursor on the empty line between the multiline quotes ("""
) and then click your Insert to Code link for your file.
If it works correctly, you should then be able to build the swift URL to your file:
val file = sc.textFile("swift://notebooks.spark/TheFileYouClickedOn.txt")
In this case, notebooks is the container name, spark is the data source name (the first argument to the setConfig function), and followers.txt is the filename I'm using.
All put together it would look something like this:
import scala.collection.breakOut
def setConfig(name:String, dsConfiguration:String) : Unit = {
val pfx = "fs.swift.service." + name
val settings:Map[String,String] = dsConfiguration.split("\\n").
map(l=>(l.split(":",2)(0).trim(), l.split(":",2)(1).trim()))(breakOut)
val conf = sc.getConf
conf.set(pfx + "auth.url", settings.getOrElse("auth_url",""))
conf.set(pfx + "tenant", settings.getOrElse("tenantId", ""))
conf.set(pfx + "username", settings.getOrElse("username", ""))
conf.set(pfx + "password", settings.getOrElse("password", ""))
conf.set(pfx + "apikey", settings.getOrElse("password", ""))
conf.set(pfx + "auth.endpoint.prefix", "endpoints")
}
val YOUR_DATASOURCE = """auth_url : https://identity.open.softlayer.com
project : object_storage_abc123
project_id : abc123abc123abc123abc123abc123
region : dallas
user_id : 123abc123abc123abc123abc123abc
domain_id : a1b2c3a1b2c3a1b2c3a1b2c3a1b2c3
domain_name : 123456
username : user_a1b2c3a1b2c3a1b2c3a1b2c3a1b2c3
password : WhateverPasswordValueGoesHere
filename : TheFileYouClickedOn.txt
container : notebooks
tenantId : a1b2c3-a1b2c3a1b2c3-a1b2c3a1b2c3
"""
setConfig("spark", YOUR_DATASOURCE)
val file = sc.textFile("swift://notebooks.spark/TheFileYouClickedOn.txt")
// Do stuff with your file.
You could also have this parse the filename and create the textFile reference for you but I prefer to keep them separated as you only need the connection to the one ObjectStore to use whichever files are located in it. It could probably also stand to have some empty line detection, etc., but for now I just deal with that myself.
Upvotes: 2
Reputation: 2346
Please see: https://www.ng.bluemix.net/docs/services/AnalyticsforApacheSpark/index.html -= specifically the section on "Reusing existing Object Storage...". What version of object store are you interested in consuming from? (v1, v2, v3, or SL OS?)
Upvotes: 1