Simon
Simon

Reputation: 44

MySQL : SELECT all USERS With 2 messages open and clicked after timestamp

I'm making a MySQL script executed in PHP. So I have 3 tables.

Messages who contains at least 3 000 000 rows (userid, messageid, timestamp, received, opened, clicked, deliveryid)

Users (user(unique), profile, profile_actual_timestamp, last_delivery_id_sent)

events_clicked_data who contains at least 2 000 000 rows and detailed clicked event. This user click on this profile(like GAMES) on this message(deliveryId) (userid, profile, deliveryId, eventDate)

So here is sample data for messages and users tables.

CREATE TABLE messages
    (`user_id` varchar(100), `message_id` int, `timestamp` datetime, `received` varchar(5), `opened` varchar(5), `clicked` varchar(5), `delivery_id` int);

INSERT INTO messages
    (`user_id`, `message_id`, `timestamp`, `received`, `opened`, `clicked`, `delivery_id`)
VALUES
    ("BillyStuff", 12,'2016-05-16 00:00:00', 'true', 'true', 'true', 8),
    ("BillyStuff", 11,'2016-05-14 00:00:00', 'true', 'true', 'true' , 7),
    ("BillyStuff", 8,'2016-04-03 00:00:00', 'true', 'false', 'false' , 6),
    ("BillyStuff", 4,'2016-04-02 00:00:00', 'true', 'false', 'false', 5),
    ("JohnDoe", 15 ,'2016-05-16 00:00:00', 'true', 'true', 'false' , 4),
    ("JohnDoe", 13 ,'2016-05-14 00:00:00', 'true', 'true', 'true', 3),
    ("Donnie", 15 ,'2016-05-16 00:00:00', 'true', 'true', 'true' , 4),
    ("Donnie", 13 ,'2016-05-14 00:00:00', 'true', 'true', 'true', 3)

CREATE TABLE users
    (`user_id` varchar(100), `profile` varchar(100), `profile_actual_timestamp` datetime, `last_delivery_id_sent` int);

INSERT INTO users
    (`user_id`, `profile`, `profile_actual_timestamp`, `last_delivery_id_sent`)
VALUES
    ("BillyStuff", "Game", "2016-01-01 00:00:00", 1),
    ("JohnDoe", "Book", "2016-01-01 00:00:00", 1),
    ("Donnie", "Book", "2016-05-16 00:00:00", 4)

I want to get users with 2 messages clicked after timestamp (profile_actual_timestamp means last time it was updated) in profile.

In this case I only get BillyStuff because Donnie is already up to date if I check if profile_actual_timestamp.

After this, I need to check by deliveryId and user in events_clicked's table if same categories was clicked.

CREATE TABLE events_clicked_data
    (`user_id` varchar(100), `profile` varchar(100), `deliveryId` int, `eventDate` datetime);

INSERT INTO users
    (`user_id`, `profile`, `deliveryId`, `eventDate`) VALUES
    ("BillyStuff", "Book", 8,"2016-01-01 00:00:00"),
    ("BillyStuff", "Book", 7,"2016-01-01 00:00:00"),
    ("JohnDoe", "Book", 3,"2016-01-01 00:00:00"),
    ("Donnie", "Book", 4,"2016-05-16 00:00:00"),
    ("Donnie", "Game", 3,"2016-05-16 00:00:00")

In this case, i need to update BillyStuff's profile and change it to "Book" instead of "Game" because he clicked on the same categorie twice in his last messages after the last time he was updated (profile_actual_timestamp)

So its been a really good puzzle for me this week and Im wondering if you guys can help me with this one.

originalid = userid (not necessary original, depend on table)
e.name = name of profil clicked like game.

select  originalid,
        name
from   (
        select  @g := if(@u = originalid, if (@p = name, @g, @g + 1), 1) as grp,
                @u := originalid as originalid,
                @p := name as name
        from   (
                select     u.originalid, 
                           m.message_sendtime_timestamp,
                           e.name
                from       bienvenue_nouveau_client_dev u
                inner join messages_nouveaux_clients m
                        on m.originalid = u.originalid
                inner join events_clicked_data e
                        on e.originalId = u.originalid
                       and e.deliveryId = m.deliveryId
                where      m.message_sendtime_timestamp >= u.profil_actuel_timestamp
                and        m.clicked = 'TRUE'
                limit      1000000000000000
                order by   u.originalid,
                           m.message_sendtime_timestamp desc
               ) alias
        ) alias2
where    grp = 1
group by originalid, name
having   count(*) > 1

Whatever I change I got an error like this:

1250 - Table 'u' from one of the SELECTs cannot be used in global ORDER clause

Upvotes: 0

Views: 75

Answers (1)

trincot
trincot

Reputation: 350667

The first query can be done as follows:

select     u.*
from       users u
inner join messages m
        on m.user_id = u.user_id
where      m.timestamp >= u.profile_actual_timestamp
and        m.clicked = 'true'
group by   u.user_id
having     count(m.message_id) > 1

This second query will give you the users that used the same profile in their last two click events, if these events happened both later than the last update:

select  user_id,
        profile
from    (
        select  @g := 0 + if(@u = user_id, if (@p = profile, @g, @g + 1), 1) as grp,
                @u := user_id as user_id,
                @p := profile as profile  
        from    (
                select     u.user_id, 
                           m.timestamp,
                           e.profile
                from       users u
                inner join messages m
                        on m.user_id = u.user_id
                left join  events_clicked_data e
                        on e.user_id = u.user_id
                       and e.deliveryId = m.delivery_id
                where      m.timestamp >= u.profile_actual_timestamp
                order by   u.user_id,
                           m.timestamp desc
                ) alias
        ) alias2
where grp = 1
group by user_id, profile
having  count(*) > 1

SQL fiddle

This query depends on variables, and is a bit risky, in that it must return the innermost results in the specified order, and that it must evaluate the middle select list (with variable assignments) in top-to-bottom order. This happens consistently, but in theory it is not guaranteed.

The (high) limit in the inner query is there to make sure the order by clause is applied, not to actually limit anything.

The variables @u and @p track the user_id and profile. Based on their previous values, the variable @g is calculated: it is reset to 1 whenever the user_id changed. Otherwise it is kept at the same value if also the profile didn't change, and it is incremented when the profile did change.

This way the grp values identify groups in which the profile is the same and uninterrupted in order of descending timestamp. The most recent group per used has number 1, which is the only one of interest in the outer query. The outer query then requires that this group 1 has more than one record (i.e. more than one occurrence of the same profile).

Upvotes: 1

Related Questions