Reputation: 1

SQL Multi Table and Multi Column Select

I'm making a mysql database that has one table for each student in a school, and in each table it then has the timetable of each student. I need to be able to run a script that will search every table in the database and every column for 2 values. For example, it needs to search all tables and columns for teacher "x" where day_week = MondayA. In the table, there are 11 columns total, one for the day_week then 5 for period lesson (so period 1 lesson, period 2 lesson ect) then another 5 for the teacher they have for each period.

Any help would be much appreciated.

Thanks.

Upvotes: 0

Answers (4)

Tim Hoolihan

Reputation: 12396

First, it's worth noting this is probably not the best approach. A table per student sounds like a bad idea. You are going to be generating massive amounts of dynamic queries and not able to leverage indexing, so performance will suffer. I would highly recommend finding an approach to get the tables into one table and time series into a join table. Or look at a noSQL (non-relational approach). A document database seems like it might be a fit here.

That said, to answer your question: You need to query the schema (information_schema tables) for lists of tables and columns and then loop through querying the tables.

Start with the mysql docs here on information_schema

Upvotes: 3

Tynach

Reputation: 182

Fix your schema

First of all, your schema sounds very bad. Every time you add a new student, you have to change it (add a new table), and if this were for a real school, that would be an absolute disaster! Changing the schema is more expensive than simply inserting a row into a table, and if your web application can directly change the database, then any security exploits that might be exposed could potentially lead to people messing with your tables without you realizing it.

On top of that, it makes querying, say, the number of students an absolute pain. Ideally, your data should be laid out in a way that lets you answer any and all questions you might ever have for it. Not just questions you have now, but further down the road.

And if that's not bad enough, it makes querying a nightmare. You have to keep track of the number of tables somehow, and their names, so that every time you query information it's running an entirely different query. Some queries, like 'List students that joined in the last year', grow in size, complexity, and time to run as the list of students (the number of tables) grows. This may be what you're running into already, though it's hard to tell simply from your question.

Normalization

Normalization is, put simply, 'Designing the schema well'. It's a bit of a vague topic, but it's broken down into varying levels; and each level depends on the last.

To be perfectly honest, I don't understand the wording of the different levels, and I'm a little bit of a newb at databases myself, but here is the gist of normalization, from what I've been taught:

Every value means one, small, simple thing

Basically, don't go crazy and put a bunch of stuff in a single column. It's bad design to have a column like, 'Categories', and the value be a long string that reads like, "Programming, Databases, Web Development, MySQL, Cows".

First of all, parsing strings is time consuming, especially the longer they are, and second of all, if those categories are associated with anything else - like, perhaps you have a table of categories for people to choose from - then now you're checking larger strings for the contents of smaller strings. If you want to pull up every item of a certain category, you will be matching that string against the ENTIRE database... Which can be excruciatingly slow.

I'm not sure if this is part of normalization, but what I've learned to do is to make a numeric 'ID' for everything I refer to in more than one table. For example, instead of a database table that has the columns 'Name', 'Address', 'Birthday', I'll have, 'ID', 'Name', 'Address', 'Birthday'. ID would be a unique number for every row, a primary key, and if at any time I wanted to refer to ANY of the people in it, I'd just use that number.

Numbers are much quicker to compare/match, much quicker to look up, and overall much nicer for the database to deal with, and let you create queries that run at very tiny fractions of the amount of time as with a string-based database.

To complete the example, you could have three tables; say, 'Articles', 'Categories', and 'Article_Categories'.

'Articles' would hold all the actual articles and their properties. Something like, 'ID', 'Title', 'Content'.

'Categories' would hold all of the individual categories available, with 'ID' and 'Category' fields.

'Article_Categories' would hold the combinations of articles to categories; a unique combination of 'Article_ID' and 'Category_ID'.

What this might look like:

Articles
- 1, 'Web Cow Geniuses', 'Cows have been shown to know how to create great databases for websites using MySQL.';
- 2, 'Why to use MySQL', "It's free, duh!";
Categories
- 1, Cows;
- 2, Databases;
- 3, MySQL;
- 4, Programming;
- 5, Web Development;
Article_Categories
- 1, 1;
- 1, 2;
- 1, 3;
- 1, 4;
- 1, 5;
- 2, 2;
- 2, 3;

Notice that each combination in 'Article_Categories' is unique; you never see, for example, '1, 3' twice. But '1' is in the first column multiple times, and '3' is in the second column multiple times.

This is called a 'many to many' table. You use it when you have a relationship between two data sets, where there are multiple combinations for mixing them. Essentially, where any number of items in one can correspond to any number of items from the other.

Do not mix data and metadata

Basically, data is the content of the tables. The values inside the rows. Metadata is the tables themselves; the table names, the value types, and the relationships between two different sets of data.

Metadata inside data

Here's an example of putting metadata inside data:

A 'People' table that has, as columns, 'isStudent' and 'isTeacher'.

When data is put in 'People', you might have a row where they are both a teacher and a student, so you put something like 'ID', 'Name', 'yes', 'yes'. This doesn't sound bad, and there may well be a teacher who's taking classes at the same school so it is possible.

However, it takes up more space since you have to have a value of some sort in both columns, even if they are only one or the other.

A better way to make this would be to split it out into three separate tables:

A 'People' table that has an ID, name, and other data that every person has.
A 'Students' table that uses only the values of the 'People.ID' as data.
A 'Teachers' table that uses only the values of the 'People.ID' as data.

This way, everybody who is a student gets referenced to in 'Students', and everyone who's a teacher gets referenced in 'Teachers'. As mentioned previously, we use the 'ID' field because it's quicker to match up across tables. Now, there are only as many Teachers referenced as there need to be, and the same goes for Students. This initially takes up more space due to the size overhead of having them as separate tables, but as the database grows, this is more than made up for.

This also allows you to reference teachers directly. Say you have a table of 'Classes', and you only want Teachers capable of being the, well, Teacher. Your 'Classes' table, in the 'Teachers' column, can have a foreign key to 'Teachers.ID'. That way, if a Student hacks the database and tries to put themselves as teaching a class somehow, it's impossible for them to do so.

Data inside metadata

This is quite similar to what you appear to be having problems with.

Data is, essentially, what it is we are trying to store. Student names, teacher names, schedules for both, etc. However, sometimes we put data - like a student's name - inside of metadata - like the name of a table.

Whenever you see yourself regularly adding onto or changing the schema of a database, it is a HUGE sign that you are putting data inside of metadata. In your case, every student having their own table is essentially putting their name in the metadata.

Now, there are times where you kinda want to do this, when the number of tables will not change THAT often. It can make things simpler.. For example, if you have a website selling underwear, you might have both 'Mens_Products' and 'Womens_Products' tables. Obviously the 'neater' solution would be to have a 'Product_Categories' table, in case you want to add transgender products or other sell products to both genders, but in this case it doesn't matter that much. It wouldn't be hard to add a 'Trans_Products' table, and it's not like you'd be adding new tables frequently.

Do not duplicate data

At first, this'll sound like I'm contradicting EVERYTHING I've just said. "How am I supposed to copy those IDs everywhere if I'm not supposed to duplicate data?!" But alas, that's not exactly what I mean. In fact, this is another reason for having a separate ID for each item you might refer to!

Essentially, you don't want to have to update more data than you need to. If, for example, you had a 'Birthday' column in your 'Students' and your 'Teachers' tables in the above example, and you had someone who was both a Student and a Teacher, suddenly their birthday is recorded in two different spots! Now, what if the birthday was wrong, and you wanted to change it? You'd have to change it twice!

So instead, you put it in your 'People' table. That way, for each person, it only exists once.

This might seem like an obvious example, but you'd be surprised at how often it can occur by accident. Just be careful, and watch for anything that requires you to update the same value in two different locations.

Queries

So, with all that out of the way, how should you query? What sort of SELECT statement should you use?

Lets say you have the following schema (primary key in bold):

People:
- ID
- Name (Unique)
- Birthday
Teachers:
- People_ID (Foreign: People.ID)
Students:
- People_ID (Foreign: People.ID)
Classes:
- ID
- Name (Unique)
- Teacher_ID (Foreign: Teachers.ID)
Class_Times:
- Class_ID (Foreign: Classes.ID)
- Day (Enum: 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday')
- Start_Time
Student_Classes:
- Student_ID (Foreign: Students.ID)
- Class_ID (Foreign: Classes.ID)

First note that 'Student_Classes' has two primary keys... This makes the combination of the two unique, not the individual ones. This makes it a many-to-many table, as discussed earlier. I did this also for 'Class_ID' and 'Day' so that you wouldn't put the class twice on the same day.

Also, it may be bad that we use an Enum for the days of the week... If we wanted to add Sunday classes, we'd have to change it, which is a change in the schema, which could potentially break things. However, I didn't feel like adding a 'Days' table and all that.

At any rate, if you wanted to find all of the teachers who were teaching on a Monday, you could just do this:

SELECT
    People.Name
FROM
    People
    LEFT JOIN
        Teachers
        ON
            People.ID = Teachers.People_ID
    LEFT JOIN
        Classes
        ON
            People.ID = Classes.Teacher_ID
    LEFT JOIN
        Class_Times:
        ON
            Classes.ID = Class_Times.Class_ID
WHERE
    Class_Times.Day = 'Monday';

Or, formatted in one big long string (like it'll be when you put it in your other programming langauge):

SELECT People.Name FROM People LEFT JOIN Teachers ON People.ID = Teachers.People_ID LEFT JOIN Classes ON People.ID = Classes.Teacher_ID LEFT JOIN Class_Times: ON Classes.ID = Class_Times.Class_ID WHERE Class_Times.Day = 'Monday';

Essentially, here is what we do:

Select the main thing we want, the teacher's name. The name is stored in the 'People' table, so we select from that first.
We then left join it to the 'Teachers' table, telling it that all of the People we select must be a Teacher.
After that, we do the same with 'Classes'; narrowing it down to only Classes that the Teacher actually teaches themselves.
Then we also grab 'Class_Times' (important for the final step), but only for those Classes that the Teacher is teaching.
Finally, we specify that the Day the Class takes place must be a 'Monday'.

Upvotes: 3

Hamza

Reputation: 1583

You need to create one table for students and one for timetable and have foreign key of student in timetable. Use best practices, consider you have 1000 students, you will end up creating 1000 tables while database is there is make life easier. Create one table, add as many entries as you want.

Secondly, ask your question more clearly using this structure so we may be able to help you

Upvotes: 1

theB3RV

Reputation: 924

Table 1: Student: id firstName lastName

Table 2: Schedule: studentID day period classID

studentID(relates to Student.id)

classID(relates to Classes.id)

Table 3: Classes: id className teacherName

BOLD is primary key

This will gather all students that have that teacher:

Select S1.firstName, S1.lastName, C.teacherName from Student as S1 join Schedule as S2 join Classes as C where S1.id = S2.studentID and S2.classID = C.id and C.teacherName = XXXX

This will gather all students that are in a certain class:

Select S1.firstName, S1.lastName from Student as S1 join Schedule as S2 where S1.id = S2.studentID and S2.classID = XXXX