Reputation: 25
I'm trying to create a MongoDB database that contains two collections: Students and Courses.
The first collection "students" contains:
from pymongo import MongoClient
import pprint
client = MongoClient("mongodb://127.0.0.1:27017")
db = client.Database
student = [{"_id":"0",
"firstname":"Bert",
"lastname":"Holden"},
{"_id":"1",
"firstname":"Sam",
"lastname":"Olsen"},
{"_id":"2",
"firstname":"James",
"lastname":"Swan"}]
students = db.students
students.insert_many(student)
pprint.pprint(students.find_one())
The second collection "courses" contains:
from pymongo import MongoClient
import pprint
client = MongoClient("mongodb://127.0.0.1:27017")
db = client.Database
course = [{"_id":"10",
"coursename":"Databases",
"grades":"[{student_id:0, grade:83.442}, {student_id:1, grade:45.323}, {student_id:2, grade:87.435}]"}]
courses = db.courses
courses.insert_many(course)
pprint.pprint(courses.find_one())
I then want to use aggregation to find a student and the corresponding courses with grade(s).
from pymongo import MongoClient
import pprint
client = MongoClient("mongodb://127.0.0.1:27017")
db = client["Database"]
pipeline = [
{
"$lookup": {
"from": "courses",
"localField": "_id",
"foreignField": "student_id",
"as": "student_course"
}
},
{
"$match": {
"_id": "0"
}
}
]
pprint.pprint(list(db.students.aggregate(pipeline)))
I'm not sure if the student_id/grade is implemented correctly in the "courses" collection, so that might be one reason why my arregation returns [].
The aggregate works if I create seperate courses for each student, but that seems like a waste of memory, so I would like to have one course with all the student_ids and grades in an array.
Expected output:
[{'_id': '0',
'firstname': 'Bert',
'lastname': 'Holden',
'student_course': [{'_id': '10',
'coursename': 'Databases',
'grade': '83.442',
'student_id': '0'}]}]
Upvotes: 0
Views: 179
Reputation: 8538
I was finally able to take a look at this..
TLDR; see Mongo Playground
This solution requires you to store grades
as an actual object vs a string.
Consider the following database structure:
db={
// Collection
"students": [
{
"_id": "0",
"firstname": "Bert",
"lastname": "Holden"
},
{
"_id": "1",
"firstname": "Sam",
"lastname": "Olsen"
},
{
"_id": "2",
"firstname": "James",
"lastname": "Swan"
}
],
// Collection
"courses": [
{
"_id": "10",
"coursename": "Databases",
"grades": [
{
student_id: "0",
grade: 83.442
},
{
student_id: "1",
grade: 45.325
},
{
student_id: "2",
grade: 87.435
}
]
}
],
}
You can achieve what you want using the following query:
db.students.aggregate([
{
$match: {
_id: "0"
}
},
{
$lookup: {
from: "courses",
pipeline: [
{
$unwind: "$grades"
},
{
$match: {
"grades.student_id": "0"
}
},
{
$group: {
"_id": "$_id",
"coursename": {
$first: "$coursename"
},
"grade": {
$first: "$grades.grade"
},
"student_id": {
$first: "$grades.student_id"
}
}
}
],
as: "student_course"
}
}
])
Upvotes: 1
Reputation: 10098
A couple of points worth mentioning..
But, nonetheless, here is a possible solution to your problem. I have revised the python code, including a redefinition of the aggregation...
The name of my test database is pythontest
as seen in this code example.
This database must exist prior to running the code else an error.
File students.py
from pymongo import MongoClient
import pprint
client = MongoClient("mongodb://127.0.0.1:27017")
db = client.pythontest
student = [{"_id":"0",
"firstname":"Bert",
"lastname":"Holden"},
{"_id":"1",
"firstname":"Sam",
"lastname":"Olsen"},
{"_id":"2",
"firstname":"James",
"lastname":"Swan"}]
students = db.students
students.insert_many(student)
pprint.pprint(students.find_one())
Then the courses file. Notice the field grades
is no longer a string, but is a valid array object? Notice the student id is a string, and not an integer? (In reality, a stronger datatype such as UUID or int would likely be preferable).
File courses.py
from pymongo import MongoClient
import pprint
client = MongoClient("mongodb://127.0.0.1:27017")
db = client.pythontest
course = [{"_id":"10",
"coursename":"Databases",
"grades": [{ "student_id": "0", "grade": 83.442}, {"student_id": "1", "grade": 45.323}, {"student_id": "2", "grade": 87.435}]}]
courses = db.courses
courses.insert_many(course)
pprint.pprint(courses.find_one())
... and finally, the aggregation file with the changed aggregation pipeline...
File aggregation.py
from pymongo import MongoClient
import pprint
client = MongoClient("mongodb://127.0.0.1:27017")
db = client.pythontest
pipeline = [
{ "$match": { "grades.student_id": "0" } },
{ "$unwind": "$grades" },
{ "$project": { "coursename": 1, "student_id": "$grades.student_id", "grade": "$grades.grade" } },
{
"$lookup":
{
"from": "students",
"localField": "student_id",
"foreignField": "_id",
"as": "student"
}
},
{
"$unwind": "$student"
},
{ "$project": { "student._id": 0 } },
{ "$match": { "student_id": "0" } }
]
pprint.pprint(list(db.courses.aggregate(pipeline)))
Output of running program
> python3 aggregation.py
[{'_id': '10',
'coursename': 'Databases',
'grade': 83.442,
'student': {'firstname': 'Bert', 'lastname': 'Holden'},
'student_id': '0'}]
The format of the data at the end of the program may not be as desired, but can be tweaked by manipulating the aggregation.
** EDIT **
So if you want to approach this aggregation from the student rather than approaching it from the course you can still perform that aggregation, but because the array is in courses the aggregation will be a bit more complicated. The $lookup must utilize a pipeline itself to prepare the foreign data structures:
Aggregation from Student perspective
db.students.aggregate([
{ $match: { _id: "0" } },
{ $addFields: { "colStudents._id": "$_id" } },
{
$lookup:
{
from: "courses",
let: { varStudentId: "$colStudents._id"},
pipeline:
[
{ $unwind: "$grades" },
{ $match: { $expr: { $eq: ["$grades.student_id", "$$varStudentId" ] } } },
{ $project: { course_id: "$_id", coursename: 1, grade: "$grades.grade", _id: 0} }
],
as: "student_course"
}
},
{ $project: { _id: 0, student_id: "$_id", firstname: 1, lastname: 1, student_course: 1 } }
])
Output
> python3 aggregation.py
[{'firstname': 'Bert',
'lastname': 'Holden',
'student_course': [{'course_id': '10',
'coursename': 'Databases',
'grade': 83.442}],
'student_id': '0'}]
Upvotes: 1