thehand0
thehand0

Reputation: 1163

Strange results for BigQuery/GDELT. How to correctly parse dates?

I'm wanting to run a query on the global database for events, language and tone (GDELT) using BigQuery. I am specifically searching for key terms (e.g. like bribery or corruption) and grouping the count of the results by year (2010,2011,2012 etc). I am able to group by year and count occurrences with the .full_events GDELT (see example 2), but not with the .gkg_partitioned GDELT (example 1). The SQL code for example 1 is valid and runs, however, it does not seem to produce the expected results based on what I know/expect. I think it has something to do with the way I am specifying the dates, but don't really know. Could someone please help explain why and/or correct whatever is wrong? Thanks in advance.

below code will use up a bunch of GB, so change the terms as necessary)

(example 1 : code doesnt produce the expected results for .gkg partitioned)

SELECT DATE, count(*)
 FROM `gdelt-bq.gdeltv2.gkg_partitioned`
 WHERE (V2Themes LIKE '%corruption%'
  OR V2Themes LIKE '%bribery%'
  AND DATE>=20100101000000 and _PARTITIONTIME >= TIMESTAMP("2010-01-01") AND DATE<=20151231999999 
and _PARTITIONTIME <= TIMESTAMP("2015-12-31")
  GROUP BY DATE

(example 2: .full_events code does produce the expected results)

(SELECT YEAR, ActionGeo_CountryCode, COUNT(*) c
 FROM `gdelt-bq.full.events`
 WHERE ActionGeo_CountryCode IS NOT NULL 
 and YEAR >2010
 and YEAR <= 2020
 and QuadClass=4
  GROUP BY 1, 2)

Upvotes: 0

Views: 436

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269563

I'm not 100% sure what your problem is, but the WHERE clause does not look correct. I would expect:

WHERE (V2Themes LIKE '%corruption%' OR V2Themes LIKE '%bribery%') AND
      DATE >= 20100101000000 AND
      DATE <= 20151231999999 AND
     _PARTITIONTIME >= TIMESTAMP('2010-01-01') AND 
     _PARTITIONTIME <= TIMESTAMP('2015-12-31')

Note the parentheses around the OR conditions.

Upvotes: 0

Related Questions