Alex Blokha
Alex Blokha

Reputation: 1171

Select bigger articles among two wikipedias

Using "Quarry" from Wmflabs (https://quarry.wmflabs.org/).

As I understand this is MySQL.

How can I select all articles from german wiki, which are bigger by size, than articles in enwiki?

Here is the code, I use. I need the de_len, but I don't know how to get it.

USE enwiki_p;

SELECT 
        page_title en_title,
        page_len en_len,
        ll.ll_title as de_title
FROM page
JOIN langlinks ll
    ON ll_from = page_id
WHERE page_namespace = 0
GROUP BY page_id  
HAVING max(CASE ll_lang  WHEN 'de' THEN 1 ELSE 0 END) = 0  
LIMIT 10

Upvotes: 0

Views: 191

Answers (1)

XXN
XXN

Reputation: 148

You need to work with minimum three tables from two databases to get properly page sizes of corresponing articles (pages) from two wikis, plus related interlanguage links for relationship, for a good comparison. A solution would be this:

SELECT 
       enp.page_title as en_title,
       enp.page_len as en_len,
       dep.page_title as de_title,
       dep.page_len as de_len        
FROM enwiki_p.page enp
JOIN enwiki_p.langlinks enll
  ON enll.ll_from = enp.page_id and ll_lang="de"
JOIN dewiki_p.page dep
  ON enll.ll_title = REPLACE(dep.page_title, '_', ' ')
WHERE enp.page_namespace = 0 and enp.page_is_redirect = 0 
  AND dep.page_namespace = 0 and dep.page_is_redirect = 0
GROUP BY enp.page_id  
HAVING dep.page_len > enp.page_len
ORDER BY dep.page_len DESC  
LIMIT 100

Upvotes: 1

Related Questions