Pierre
Pierre

Reputation: 988

query mongodb with spark

I am connecting MongoDB to Spark and I want to load data using a query.

df = sqlContext.read.format("com.mongodb.spark.sql").options(collection='test', query = {'name' :'jack'}).load()
df.show()

But it returns me the whole collection. How can I reproduce the same thing as this query db.test.find({'name':'jack'}) in Spark ?

Upvotes: 2

Views: 1048

Answers (1)

zero323
zero323

Reputation: 330413

You can use filter or where to specify the condition:

from pyspark.sql.functions import col

df.filter(col("name") == "jack")

It will converted to an aggregation pipeline:

When using filters with DataFrames or Spark SQL, the underlying Mongo Connector code constructs an aggregation pipeline to filter the data in MongoDB before sending it to Spark.

Upvotes: 4

Related Questions