coolerfarmer
coolerfarmer

Reputation: 385

Query data from table based of entries in another table

So the title may be a little confusing, but it's hard to describe the problem in just a few words. So I have two tables that are important right now:

CREATE TABLE IF NOT EXISTS `private_crawler_urls` (
`id` int(11) NOT NULL,
  `url` text NOT NULL,
  `hash` varchar(47) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB AUTO_INCREMENT=100 DEFAULT CHARSET=latin1;

CREATE TABLE IF NOT EXISTS `private_crawler_url_checks` (
`id` int(11) NOT NULL,
  `url_id` int(11) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `total_proxies` int(11) NOT NULL,
  `working_proxies` int(11) NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=78 DEFAULT CHARSET=latin1;

Now I want to select all the urls from private_crawler_urls where there's no entry in private_crawler_url_checks which is newer than a given Timespan (e.g. 10 Minutes). That's what I have right now:

SELECT
    u.id, u.url, c.created_at
FROM
    private_crawler_urls u
INNER JOIN
    private_crawler_url_checks c ON (c.url_id = u.id)
WHERE 
    c.created_at < NOW() - INTERVAL 10 MINUTE
ORDER BY c.created_at ASC

The problem is that this can't work as I don't want to delete the old entries in private_crawler_url_checks and so there's always an entry that's older than the given Timespan even if there's a fresh one in the table. As I'm pretty new to MySQL I have no Idea how I can accomplish this, so I need your help. Thanks! In case you need more information, just leave a comment!

Upvotes: 0

Views: 25

Answers (1)

Poul Kruijt
Poul Kruijt

Reputation: 71891

I think this will make you keep your old records and not interfere with your wish. Not all that certain about performance impact though:

SELECT
    pcu.url
FROM
    private_crawler_urls as pcu
WHERE
    pcu.id NOT IN (

        SELECT
            pcuc.url_id
        FROM
            private_crawler_url_checks as pcuc
        WHERE
            pcuc.created_at > DATE_SUB(NOW(), INTERVAL 10 MINUTE)       
)

First it selects all url_id, where there is a created_at in the last 10 minutes. After that it will select all url where the id is not in this list

Upvotes: 1

Related Questions