Reputation: 988
I am connecting MongoDB to Spark and I want to load data using a query.
df = sqlContext.read.format("com.mongodb.spark.sql").options(collection='test', query = {'name' :'jack'}).load()
df.show()
But it returns me the whole collection. How can I reproduce the same thing as this query db.test.find({'name':'jack'}) in Spark ?
Upvotes: 2
Views: 1048
Reputation: 330413
You can use filter
or where
to specify the condition:
from pyspark.sql.functions import col
df.filter(col("name") == "jack")
It will converted to an aggregation pipeline:
When using filters with DataFrames or Spark SQL, the underlying Mongo Connector code constructs an aggregation pipeline to filter the data in MongoDB before sending it to Spark.
Upvotes: 4