Georg Heiler
Georg Heiler

Reputation: 17676

spark read from mongodb

is there a better option to read from mongodb via spark? currently I use https://github.com/Stratio/Spark-MongoDB

Do I understand correctly that

val mongoRDD = sql.fromMongoDB(mongoConfigurationData)
    mongoRDD.registerTempTable("myTable")

is so slow because a lot of the data is scanned initially? How can it be that

sql.sql("CREATE TEMPORARY TABLE myTable USING mongoConfigurationData)") seems to be slower?

Upvotes: 0

Views: 3370

Answers (1)

charles gomes
charles gomes

Reputation: 2155

You can read from mongodb using unity JDBC and MongoDB Java Driver

import mongodb.jdbc.MongoDriver

Import the two classes SparkConf and SparkContext

import org.apache.spark.sql.{DataFrame, SQLContext}

Simply replace url with your mongodb url. dbtable with name of the table for which you want to create dataframe. replace user and password for your db2 database server.

val url = "jdbc:mongo://ds045252.mlab.com:45252/samplemongodb"
val dbtable = "Photos"
val user = "charles2588"
val password = "*****"
val options = scala.collection.Map("url" -> url,"driver" -> "mongodb.jdbc.MongoDriver","dbtable" ->dbtable,"user"->user,"password"->password)

Now create new SQLContext from your new Spark Context which has db2 driver loaded

val sqlContext = new SQLContext(sc)

Create a dataframereader from your SQLContext for your table

val dataFrameReader = sqlContext.read.format("jdbc").options(options)

Call the load method to create DataFrame for your table.

val tableDataFrame = dataFrameReader.load()

Call show() method to display the table contents

tableDataFrame.show()

Ref: http://bigdataauthority.blogspot.com/2016/03/connecting-to-mongodb-from-ibm-bluemix.html

Thanks,

Charles.

Upvotes: 3

Related Questions