Reputation: 17676
is there a better option to read from mongodb via spark? currently I use https://github.com/Stratio/Spark-MongoDB
Do I understand correctly that
val mongoRDD = sql.fromMongoDB(mongoConfigurationData)
mongoRDD.registerTempTable("myTable")
is so slow because a lot of the data is scanned initially? How can it be that
sql.sql("CREATE TEMPORARY TABLE myTable USING mongoConfigurationData)") seems to be slower?
Upvotes: 0
Views: 3370
Reputation: 2155
You can read from mongodb using unity JDBC and MongoDB Java Driver
import mongodb.jdbc.MongoDriver
Import the two classes SparkConf and SparkContext
import org.apache.spark.sql.{DataFrame, SQLContext}
Simply replace url with your mongodb url. dbtable with name of the table for which you want to create dataframe. replace user and password for your db2 database server.
val url = "jdbc:mongo://ds045252.mlab.com:45252/samplemongodb"
val dbtable = "Photos"
val user = "charles2588"
val password = "*****"
val options = scala.collection.Map("url" -> url,"driver" -> "mongodb.jdbc.MongoDriver","dbtable" ->dbtable,"user"->user,"password"->password)
Now create new SQLContext from your new Spark Context which has db2 driver loaded
val sqlContext = new SQLContext(sc)
Create a dataframereader from your SQLContext for your table
val dataFrameReader = sqlContext.read.format("jdbc").options(options)
Call the load method to create DataFrame for your table.
val tableDataFrame = dataFrameReader.load()
Call show() method to display the table contents
tableDataFrame.show()
Ref: http://bigdataauthority.blogspot.com/2016/03/connecting-to-mongodb-from-ibm-bluemix.html
Thanks,
Charles.
Upvotes: 3