Reputation: 81
I have installed some days ago MongoDB on my computer to do some test for a work, in detail we have to transfer a huge quantity of data from a Postgres based system to a MongoDB one. Because we don't know MongoDB (first time we use it) we tried to study the documentation and we did some tests on a little DB with few data to test the performance... After many test at this time we have still a worsening... However now I'll explain the context so maybe somebody could tell me if we did something wrong or not. We know which are the more "problematic" queries and I will wrote here one of them, in Postgres the query is something like this (I'll cut the unnecessary):
selectStmt varchar = 'SELECT station.radarmeteo_id,
date(datetime_range) AS datetime_range,
district.name AS district,
city.name AS city,
min_temperature::real / 10::real,
max_temperature::real / 10::real,
rainfall_daily::real / 10::real,
max_wind_speed::real / 10::real,
extract(epoch FROM datetime_range) as unix_datetime ';
fromStmt varchar = ' FROM measurement_daily
INNER JOIN station ON measurement_daily.station_id = station.id;
In MongoDB we wrote this:
db.measurement_daily.aggregate([{"$match":{"min_temperature":{"$gt":random.randint(-30, 14), "$lt":random.randint(18, 50)}}},{"$lookup":{"from":"station","localField":"station_id", "foreignField":"_id", "as": "scd"}},{"$unwind":"$scd"},{"$project":{"_id":1,"min_temperature":1,"max_temperature":1, "rainfall_daily":1, "max_wind_speed":1, "radarmeteo_id":"$scd.radarmeteo_id", "city_name":"$scd.city_name", "district_name":"$scd.district_name"}},{"$out":"result"}])
What I am asking here is: it should be written better? Or there could be a better way to have the same result? Is there any other optimization we can use to? We need the best response time because the real DB should have something like 200.000.000 of data only in this collection... And just here with 2 tables with 1000 (station) and 6400 (measurement_daily) records/documents respectively, we have 3,5-4s (Postgres) vs 30-32s (MongoDB) as response time... (To test the performance in both the systems the query is repeated 200 times (that's why we have 3,5-4s and 30-32s for one query respectively) to have an "homogeneous" response time to minimize conditioning by external causes.) Any help is really appreciated...
Upvotes: 1
Views: 95
Reputation:
According to mongoDB documentation When a $unwind immediately follows another $lookup, and the $unwind operates on the as field of the $lookup, the optimizer can coalesce the $unwind into the $lookup stage. This avoids creating large intermediate documents.
In your case it will look like:
"$lookup": {
"from":"station",
"localField":"station_id",
"foreignField":"_id",
"as": "scd"
unwinding: { preserveNullAndEmptyArrays: false }
}
Upvotes: 1