LEBRON L
LEBRON L

Reputation: 17

Acquire patents data from BigQuery by application

I want to collect data by application_number like this.

SELECT p.application_number AS app, COUNT(c.publication_number) AS Citations 
     FROM 'patents-public-data.patents.publications' AS p, UNNEST(citation) AS c 
     WHERE p.application_number IN ('CN201510747352'
         ) 
     GROUP BY p.application_number

But it can't work. The url is the patent page. Who can do me a favor? patent_application_number

Upvotes: 1

Views: 98

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173121

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  p.application_number AS app, 
  SUM((SELECT COUNT(publication_number) FROM UNNEST(citation))) AS Citations 
FROM `patents-public-data.patents.publications` AS p
WHERE p.application_number IN ('CN-201510747352-A') 
GROUP BY p.application_number   

with result

Row app                 Citations    
1   CN-201510747352-A   14      

Please note: your original query will work if you will use CN-201510747352-A instead of CN201510747352 as below

#standardSQL
SELECT p.application_number AS app, COUNT(c.publication_number) AS Citations 
FROM `patents-public-data.patents.publications` AS p, 
UNNEST(citation) AS c 
WHERE p.application_number IN ('CN-201510747352-A') 
GROUP BY p.application_number    

but still - I recommend you using the query I provided - the reason is - if given application has no citation at all - such application will not be returned in output whereas recommended query will return count = 0

for example - if you will comment out WHERE clause in both queries - first will return 76,073,734; while second will return 29,489,639 apps.

It might be not that important in this particular use case - but good to have in mind for your next queries

Another question is the number queried is 14 which is not the same as 7 in original website. Is there any mistakes in it?

7 is correct answer - see below

#standardSQL
SELECT 
  p.application_number AS app, 
  COUNT(DISTINCT c.publication_number) Citations 
FROM `patents-public-data.patents.publications` AS p,
UNNEST(citation) c
WHERE p.application_number IN ('CN-201510747352-A') 
GROUP BY p.application_number     

with result

Row app                 Citations    
1   CN-201510747352-A   7    

Upvotes: 1

Related Questions