Reputation: 21
I am trying to extract the citation network of a list of papers from pubmed using Biopython Entrez.elink. I was able to submit a list of PMIDs when using efetch
but elink
only lets me query multiple PMIDs when I use elink(db ='pubmed',id=",".join(list_of_PMIDs)
. Otherwise it only returns the cited papers from the first paper in the list. Furthermore, the results that elink returns are out of order and do not have any dividers, ie it is impossible to tell where the citations of one paper end and the next begin.
Code
pmidlist = ['20675860','17338551']
links = Entrez.elink(id=",".join(pmidlist),linkname="pubmed_pubmed")
record = Entrez.read(links)
records = record[0]
print(record[0]['LinkSetDb'][0]['Link'])
Returned list
[{'Id': '20675860'}, {'Id': '17338551'}, {'Id': '18512960'}, {'Id': '15485804'}, {'Id': '16682405'}, {'Id': '17635932'}, {'Id': '17517655'}, {'Id': '16519522'}, {'Id': '29024026'}, {'Id': '19088188'}, {'Id': '29170487'}, {'Id': '18391193'}, {'Id': '18311969'}, {'Id': '12819771'}, {'Id': '12887903'}, {'Id': '12514135'}...........
As you can see the two queried PMIDs are first in the list and there is no way to tell where the divider between the two citation lists is.
How can I get the citation list without having to query every PMID individually?
Upvotes: 2
Views: 79