Jones
Jones

Reputation: 83

MYSQL JSON select query returning different results each time it is run

I have a simple select query on JSON data with a few simple calculations which return different results when I run it several times.

For the life of me, I just cannot find out why and so far no luck in my search for an answer. I can see from the query stats that a temporary table is built, which I imagine fails to truncate/ is saved somewhere upon completion of the query, but I am unable to locate (let alone truncate/drop) this table. It is just a working hypothesis though - the answer might be in a setting or ...

Can anyone find any logic to this?

I am using MySQL Workbench on MySQL DB on local instance. MySQL: Version 8.0.19 MySQL Community Server - GPL.

This is a simplified version of the table:

CREATE TABLE regulation_entries (
    id INTEGER UNSIGNED AUTO_INCREMENT,
    employee_id VARCHAR(10) NOT NULL,
    regulation JSON NOT NULL,
    PRIMARY KEY (id)
    );

# Sample data to work with    

INSERT INTO regulation_entries VALUES
    (DEFAULT, 2, '{"entry_base_salary": "8000", "pension_pct": "0.08"}'),
    (DEFAULT, 3, '{"entry_base_salary": "10000", "pension_pct": "0.08"}'),
    (DEFAULT, 5, '{"entry_base_salary": "11000", "pension_pct": "0.08"}'),
    (DEFAULT, 8, '{"entry_base_salary": "11000", "pension_pct": "0.08"}'),
    (DEFAULT, 9, '{"entry_base_salary": "9000", "pension_pct": "0.08"}'),
    (DEFAULT, 1, '{"entry_base_salary": "14000", "pension_pct": "0.10"}'),
    (DEFAULT, 6, '{"entry_base_salary": "13000", "pension_pct": "0.08"}'),
    (DEFAULT, 7, '{"entry_base_salary": "14000", "pension_pct": "0.08"}'),
    (DEFAULT, 2, '{"base_salary_adjustment": "500"}'),
    (DEFAULT, 9, '{"base_salary_adjustment": "800"}'),
    (DEFAULT, 3, '{"base_salary_adjustment": "400"}'),
    (DEFAULT, 5, '{"base_salary_adjustment": "350"}'),
    (DEFAULT, 8, '{"base_salary_adjustment": "200"}'),
    (DEFAULT, 9, '{"base_salary_adjustment": "1250"}'),
    (DEFAULT, 1, '{"base_salary_adjustment": "-200"}'),
    (DEFAULT, 1, '{"base_salary_adjustment": "50"}'),
    (DEFAULT, 6, '{"base_salary_adjustment": "700"}'),
    (DEFAULT, 7, '{"base_salary_adjustment": "825"}');

# The query that is bugging me:

SELECT employee_id, 
        SUM(regulation->>'$.entry_base_salary') AS entry_base_salary,
        regulation->>'$.pension_pct' AS pension_pct,
        AVG(regulation->>'$.pension_pct') * SUM(regulation->>'$.entry_base_salary') AS entry_pension,
        SUM(regulation->>'$.base_salary_adjustment') AS salary_adjustments,
        SUM(regulation->>'$.entry_base_salary') + SUM(regulation->>'$.base_salary_adjustment') AS future_salary,
        AVG(regulation->>'$.pension_pct') * (SUM(regulation->>'$.entry_base_salary') + SUM(regulation->>'$.base_salary_adjustment')) AS future_pension 
    FROM sreg.regulation_entries 
    GROUP BY employee_id
    ORDER BY employee_id

When I run the query I expect to see this:

employee_id #entry_base_salary  #pension_pct    #entry_pension  #salary_adjustments #future_salary  #future_pension#
#1          #14000              #0.10           #1400           #1250               #15250          #1525
#2          #8000               #0.08           #640            #1100               #9100           #728
#3          #10000              #0.08           #800            #1000               #11000          #880
#5          #11000              #0.08           #880            #950                #11950          #956
#6          #13000              #0.08           #1040           #1900               #14900          #1192
#7          #14000              #0.08           #1120           #2025               #16025          #1282
#8          #11000              #0.08           #880            #800                #11800          #944
#9          #9000               #0.08           #720            #2650               #11650          #932

But if I Execute the same query again, I get new random results. e.g.:

employee_id #entry_base_salary  #pension_pct    #entry_pension  #salary_adjustments     #future_salary  #future_pension#

1           #14000              #0.10           #1400           #3050                   #17050          #1705
2           #8000               #0.08           #640            #2900                   #10900          #872
3           #10000              #0.08           #800            #2800                   #12800          #1024
5           #11000              #0.08           #880            #2750                   #13750          #1100
6           #13000              #0.08           #1040           #3700                   #16700          #1336
7           #14000              #0.08           #1120           #3825                   #17825          #1426
8           #11000              #0.08           #880            #2600                   #13600          #1088
9           #9000               #0.08           #720            #4450                   #13450          #1076

Another example

1           #14000              #0.10           #1400           #2.119191149652875e88   #15250          #1525
2           #8000               #0.08           #640            #2.119191149652875e88   #9100           #728
3           #10000              #0.08           #800            #2.119191149652875e88   #11000          #880
5           #11000              #0.08           #880            #2.119191149652875e88   #11950          #956
6           #13000              #0.08           #1040           #2.119191149652875e88   #14900          #1192
7           #14000              #0.08           #1120           #2.119191149652875e88   #16025          #1282
8           #11000              #0.08           #880            #2.119191149652875e88   #11800          #944
9           #9000               #0.08           #720            #2.119191149652875e88   #11650          #932

And a third

1           #14000              #0.10           #1400           #3650                   #17650          #1765
2           #8000               #0.08           #640            #3500                   #11500          #920
3           #10000              #0.08           #800            #3400                   #13400          #1072
5           #11000              #0.08           #880            #3350                   #14350          #1148
6           #13000              #0.08           #1040           #4300                   #17300          #1384
7           #14000              #0.08           #1120           #4425                   #18425          #1474
8           #11000              #0.08           #880            #3200                   #14200          #1136
9           #9000               #0.08           #720            #5050                   #14050          #1124

Have any one seen this before? Any explanation? Or better yet, does anyone know what I have to change to obtain consistent results?

Upvotes: 3

Views: 150

Answers (1)

Bill Karwin
Bill Karwin

Reputation: 562260

This isn't specifically about JSON at all. This is a bug in MySQL 8.0.18, fixed in 8.0.20. The bug can occur with non-JSON columns.

https://bugs.mysql.com/bug.php?id=97920 Aggregation function [sum()] return random numbers

In fact the numbers are not wholly random, but are increasing each time one runs the query. It's like the SUM() is accumulating the sum in a temp table, and the same temp table is used for subsequent runs of the query, without zeroing out the sum.

Here's the result from the test case given in the bug above:

mysql> SELECT ym, ROUND(SUM(vb)), ROUND(SUM(vc)) FROM t_test GROUP BY ym;
+--------+----------------+----------------+
| ym     | ROUND(SUM(vb)) | ROUND(SUM(vc)) |
+--------+----------------+----------------+
| 201912 |          22675 |            227 |
+--------+----------------+----------------+

mysql> SELECT ym, ROUND(SUM(vb)), ROUND(SUM(vc)) FROM t_test GROUP BY ym;
+--------+----------------+----------------+
| ym     | ROUND(SUM(vb)) | ROUND(SUM(vc)) |
+--------+----------------+----------------+
| 201912 |          22675 |            454 |
+--------+----------------+----------------+

mysql> SELECT ym, ROUND(SUM(vb)), ROUND(SUM(vc)) FROM t_test GROUP BY ym;
+--------+----------------+----------------+
| ym     | ROUND(SUM(vb)) | ROUND(SUM(vc)) |
+--------+----------------+----------------+
| 201912 |          22675 |            682 |
+--------+----------------+----------------+

As I test, it continues to increase by 227 each time I run the query. 227 is the correct result, and was returned the first time I ran the query.

Furthermore, the problem does not occur if the underlying data to be summed does not include any NULLs.

mysql> SELECT ym, ROUND(SUM(vb)), ROUND(SUM(vc)) FROM t_test where vc is not null GROUP BY ym;
+--------+----------------+----------------+
| ym     | ROUND(SUM(vb)) | ROUND(SUM(vc)) |
+--------+----------------+----------------+
| 201912 |            252 |            227 |
+--------+----------------+----------------+

The correct value of 227 is returned every time I run this query.

The same thing happens with your query. Since your expression regulation->>'$.base_salary_adjustment' returns NULL on rows where the JSON does not contain that key, it exhibits the same bug when using SUM().

If I modified the query to convert NULL to 0, it gives the correct result and does not change no matter how many times I run the query.

SUM(COALESCE(regulation->>'$.base_salary_adjustment', 0)) AS salary_adjustments

This must be done each time you reference that JSON key (or any time you use SUM() on an expression that may be NULL.

I suggest you just upgrade to MySQL 8.0.20.

Upvotes: 2

Related Questions