Reputation: 1163
I'm wanting to run a query on the global database for events, language and tone (GDELT) using BigQuery. I am specifically searching for key terms (e.g. like bribery or corruption) and grouping the count of the results by year (2010,2011,2012 etc). I am able to group by year and count occurrences with the .full_events
GDELT (see example 2), but not with the .gkg_partitioned GDELT (example 1). The SQL code for example 1 is valid and runs, however, it does not seem to produce the expected results based on what I know/expect. I think it has something to do with the way I am specifying the dates, but don't really know. Could someone please help explain why and/or correct whatever is wrong? Thanks in advance.
below code will use up a bunch of GB, so change the terms as necessary)
(example 1 : code doesnt produce the expected results for .gkg partitioned)
SELECT DATE, count(*)
FROM `gdelt-bq.gdeltv2.gkg_partitioned`
WHERE (V2Themes LIKE '%corruption%'
OR V2Themes LIKE '%bribery%'
AND DATE>=20100101000000 and _PARTITIONTIME >= TIMESTAMP("2010-01-01") AND DATE<=20151231999999
and _PARTITIONTIME <= TIMESTAMP("2015-12-31")
GROUP BY DATE
(example 2: .full_events code does produce the expected results)
(SELECT YEAR, ActionGeo_CountryCode, COUNT(*) c
FROM `gdelt-bq.full.events`
WHERE ActionGeo_CountryCode IS NOT NULL
and YEAR >2010
and YEAR <= 2020
and QuadClass=4
GROUP BY 1, 2)
Upvotes: 0
Views: 436
Reputation: 1269563
I'm not 100% sure what your problem is, but the WHERE
clause does not look correct. I would expect:
WHERE (V2Themes LIKE '%corruption%' OR V2Themes LIKE '%bribery%') AND
DATE >= 20100101000000 AND
DATE <= 20151231999999 AND
_PARTITIONTIME >= TIMESTAMP('2010-01-01') AND
_PARTITIONTIME <= TIMESTAMP('2015-12-31')
Note the parentheses around the OR
conditions.
Upvotes: 0