tim
tim

Reputation: 3608

pull the citations for a paper from google scholar using R

Using google-scholar and R, I'd like to find out who is citing a particular paper.

The existing packages (like scholar) are oriented towards H-index analyses: statistics on a researcher.

I want to give a target-paper as input. An example url would be:

https://scholar.google.co.uk/scholar?oi=bibs&hl=en&cites=12939847369066114508

Then R should scrape these citations pages (google scholar paginates these) for the paper, returning an array of papers which cite the target (up to 500 or more citations). Then we'd search for keywords in the titles, tabulate journals and citing authors etc.

Any clues as to how to do that? Or is it down to literally scraping each page? (which I can do with copy and paste for one-off operations).

Seems like this should be a generally useful function for things like seeding systematic reviews as well, so someone adding this to a package might well increase their H :-)

Upvotes: 12

Views: 3965

Answers (2)

Milos Djurdjevic
Milos Djurdjevic

Reputation: 410

Alternatively, you could use a third party solution like SerpApi. It's a paid API with a free trial. We handle proxies, solve captchas, and parse all rich structured data for you.

Example python code (available in other libraries also):

from serpapi import GoogleSearch

params = {
  "api_key": "secret_api_key",
  "engine": "google_scholar",
  "hl": "en",
  "cites": "12939847369066114508"
}

search = GoogleSearch(params)
results = search.get_dict()

Example JSON output:

{
  "position": 1,
  "title": "Lavaan: An R package for structural equation modeling and more. Version 0.5–12 (BETA)",
  "result_id": "HYlMgouq9VcJ",
  "type": "Pdf",
  "link": "https://users.ugent.be/~yrosseel/lavaan/lavaanIntroduction.pdf",
  "snippet": "Abstract In this document, we illustrate the use of lavaan by providing several examples. If you are new to lavaan, this is the first document to read … 3.1 Entering the model syntax as a string literal … 3.2 Reading the model syntax from an external file …",
  "publication_info": {
    "summary": "Y Rosseel - Journal of statistical software, 2012 - users.ugent.be",
    "authors": [
      {
        "name": "Y Rosseel",
        "link": "https://scholar.google.com/citations?user=0R_YqcMAAAAJ&hl=en&oi=sra",
        "serpapi_scholar_link": "https://serpapi.com/search.json?author_id=0R_YqcMAAAAJ&engine=google_scholar_author&hl=en",
        "author_id": "0R_YqcMAAAAJ"
      }
    ]
  },
  "resources": [
    {
      "title": "ugent.be",
      "file_format": "PDF",
      "link": "https://users.ugent.be/~yrosseel/lavaan/lavaanIntroduction.pdf"
    }
  ],
  "inline_links": {
    "serpapi_cite_link": "https://serpapi.com/search.json?engine=google_scholar_cite&q=HYlMgouq9VcJ",
    "cited_by": {
      "total": 10913,
      "link": "https://scholar.google.com/scholar?cites=6338159566757071133&as_sdt=2005&sciodt=0,5&hl=en",
      "cites_id": "6338159566757071133",
      "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=2005&cites=6338159566757071133&engine=google_scholar&hl=en"
    },
    "related_pages_link": "https://scholar.google.com/scholar?q=related:HYlMgouq9VcJ:scholar.google.com/&scioq=&hl=en&as_sdt=2005&sciodt=0,5",
    "versions": {
      "total": 27,
      "link": "https://scholar.google.com/scholar?cluster=6338159566757071133&hl=en&as_sdt=2005&sciodt=0,5",
      "cluster_id": "6338159566757071133",
      "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=2005&cluster=6338159566757071133&engine=google_scholar&hl=en"
    },
    "cached_page_link": "https://scholar.googleusercontent.com/scholar?q=cache:HYlMgouq9VcJ:scholar.google.com/&hl=en&as_sdt=2005&sciodt=0,5"
  }
},
...

Check out the documentation for more details.

Disclaimer: I work at SerpApi.

Upvotes: 1

Ulises Rosas-Puchuri
Ulises Rosas-Puchuri

Reputation: 1970

Although there's is a bunch of available Google's API, a google scholar-based API is not available. So, albeit a web crawler on google scholar pages might not be difficult to develop, I do not know to what extent it might be illegal. Check this.

Upvotes: 1

Related Questions