AwokeKnowing
AwokeKnowing

Reputation: 8226

regular expression to extract json values in mysql field

I have a "users" table with an "assignments" field that has a list of course IDs and when then are assigned and whether they are required or optional in one json-like string (missing the top-level braces)

"BUS1077":{"startDate":"2013-09-16","hasPrerequisite":"","list":"required"},
"CMP1042":{"startDate":"2013-09-16","hasPrerequisite":"","list":"optional"},
"CMP1108":{"startDate":"2013-09-16","hasPrerequisite":"","list":"required"}

I have a another table, called "progress" that lists the course ids, like BUS1078, and whether they are completed or not.

I need a query to select the users who have completed all their required courses.

somthing like:

SELECT userid FROM users 
where (count([ids from users.assignments where list:"required"] as courseid)
      =count([extracted ids] joined using( courseid) where "complete"=1))

so there are just two tables

users (userid,assignments)
progress (id,userid,courseid,complete)

in the end I want to have selected the userids where each REQUIRED course is complete

(note, the database itself is much more complex, but this represents the gist of the problem)

Upvotes: 0

Views: 4433

Answers (3)

Mischa
Mischa

Reputation: 99

As of MySQL 5.1 you can do this with built-in functions of common_schema you can use for this purpose. I haven't used it myself but I've found a nice blog about how you can parse JSON stored data and do something usefull with it.

The blog: http://mechanics.flite.com/blog/2013/04/08/json-parsing-in-mysql-using-common-schema/

Upvotes: 2

Tin Tran
Tin Tran

Reputation: 6202

if your courseid is always 7 characters long and the list in assignments field can have up to maximum of 10 courses you can use this sqlFiddle

SELECT U.userId
FROM users U
WHERE NOT EXISTS
  (SELECT 1 FROM
        (SELECT users.userid,courseName,
              (Assignments REGEXP CONCAT('"',courseName,'"[^}]+(:"required"})'))as Required,
               Assignments,
               courseid,complete
         FROM
           (SELECT userid,courseName FROM
                (SELECT userid,SUBSTRING_INDEX(SUBSTRING_INDEX(assignments,'":{"startDate',course.num),'"',-1) as courseName
                 FROM users,(SELECT 1 as num 
                             UNION SELECT 2
                             UNION SELECT 3
                             UNION SELECT 4
                             UNION SELECT 5
                             UNION SELECT 6
                             UNION SELECT 7
                             UNION SELECT 8
                             UNION SELECT 9
                             UNION SELECT 10)course
                 )T WHERE LENGTH(courseName)=7
           )Courses
         INNER JOIN users ON users.userid = Courses.userid
         LEFT JOIN progress ON users.userid = progress.userid
                       AND Courses.courseName = progress.courseId
                       AND progress.complete = 1
        )AllCourses
   WHERE AllCourses.userId = U.userId
   AND AllCourses.Required = 1
   AND Complete IS NULL
   )

What the query does is it grabs the courseName(s) from assignment fields and see if it's required and sets required flag, then LEFT JOIN with progress and we have the Required column and Complete is NULL when the course doesn't exist in progress or when complete is not 1. We then select user id WHERE there does not EXISTS (a record in their Courses where Required = 1 AND Complete IS NULL)

In the fiddle, I have user 2 having only completed an optional course. So userId 2 is not returned.

You can just run the inner select for AllCourses subquery and see the data of all the courses for all users and whether they completed a course that is required or not.

Upvotes: 0

richardtallent
richardtallent

Reputation: 35374

I'm not familiar with the RegEx implementation in MySQL, but this basic approach should work:

SELECT userid FROM users WHERE NOT EXISTS(
     SELECT NULL FROM assignments WHERE NOT EXISTS(
        SELECT NULL FROM progress WHERE
            progress.userid = users.userid
            AND REGEXMATCH(
               assignments.assignment, 
               '(^|,)"' + progress.courseid + '":.*?"list":"required"\}') >= 0
            )
        )
     )

This should find all users where there is not a required assignment that the user hasn't completed.

Given the course IDs and the word "required" are unlikely to appear out of context, the regular expression itself could likely be much more naive, such as:

  '"' + progress.courseid + '"[^}]+"required"'

I don't know about MySQL's current limitations when it comes to correlated subqueries, but the same thing could be accomplished with joins. Using EXISTS should be preferred over COUNT, since counting requires aggregation across the entire dataset rather than allowing a short-cut on the first non-match found.

Upvotes: 1

Related Questions