Using BioPython, How To Print DOI References In A Single Line (Comma-Delimited) For A Given Pair of Search Terms, Instead Of In Multiple Lines?

Question

People of StackOverflow, first of all, thanks for your patience. I understand this is my third thread on the subject, but as I'm getting nowhere, and I don't even know where to start (I don't know what I don't know), I thought I'd ask here anyway. I'm trying to pull references from PMC using Biopython, to write back into a CSV file, consisting of, among other things, the plant name, the associated disease/condition it cures/its medicinal action, and the DOI URLs that refer to the given plant-disease pair. After a lot of hours of trying to understand what to do, and discussing the code with people much more experienced than myself, this is what was finally typed in Visual Studio Code:

  for plant, disease in plant_disease_list:
    search_query = generate_search_query(plant, disease)
    handle1 = Entrez.esearch(db="pmc", term=search_query, retmax="10")
    record1 = Entrez.read(handle1)
    pubmed_ids = record1.get("IdList")
    if len(pubmed_ids)==0:
      print("{}, {}, None".format(plant, disease))
    else:
      for pubmed_id in pubmed_ids:
        handle2 = Entrez.esummary(db="pmc", id=pubmed_id)
        records = Entrez.read(handle2)
        for record in records:
          doi = record.get("DOI")
          if doi is None:
           print(("{}, {}".format(plant, disease)))
          else:
            doi_main = doi.split()
            string = "http://doi.org/"
            to_add = (",").join((string + x) for x in doi_main)
            print("{}, {},".format(plant, disease), to_add, sep="")

where generate_search_query was previously defined as:

def generate_search_query(plant, disease):
  search_query = '"{}" AND "{}"'.format(plant, disease)
  return search_query

This is the output I'm getting:

Asystasia salicifalia, Puerperal illness, None
Asystasia salicifalia, Puerperium, None
Asystasia salicifalia, Puerperal disorder, None
Barleria strigosa, Tonic
Justicia procumbens, Lumbago, None
Justicia procumbens, Itching,http://doi.org/10.1673/031.012.0501
Strobilanthes auriculata, Malnutrition, None
Thunbergia laurifolia, Detoxificant, None
Thunbergia similis, Tonic, None
Lannea coromandelica, Dizziness,http://doi.org/10.3897/phytokeys.102.24380
Lannea coromandelica, Dizziness,http://doi.org/10.1186/s13002-016-0089-8
Lannea coromandelica, Dizziness,http://doi.org/10.1186/s13002-015-0033-3
Spondias pinnata, Flatulence,http://doi.org/10.1016/j.heliyon.2019.e02768
Spondias pinnata, Flatulence,http://doi.org/10.1186/s13002-019-0287-2
Spondias pinnata, Flatulence,http://doi.org/10.1186/s13002-018-0248-1
Spondias pinnata, Flatulence,http://doi.org/10.3897/phytokeys.102.24380
Spondias pinnata, Flatulence,http://doi.org/10.1155/2018/5382904
Spondias pinnata, Flatulence,http://doi.org/10.1186/s13002-016-0089-8
Spondias pinnata, Flatulence,http://doi.org/10.1186/s13002-015-0033-3
Spondias pinnata, Flatulence,http://doi.org/10.1186/1472-6882-13-243
Spondias pinnata, Flatulence,http://doi.org/10.1186/1472-6882-10-77
Holarrhena pubescens, Diarrhoea,http://doi.org/10.5455/javar.2019.f379
Holarrhena pubescens, Diarrhoea,http://doi.org/10.1155/2019/2321961
Holarrhena pubescens, Diarrhoea,http://doi.org/10.1186/s12906-018-2348-9
Traceback (most recent call last):
  File "scraperscript_python.py", line 33, in 
    handle2 = Entrez.esummary(db="pmc", id=pubmed_id)
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\site-packages\Bio\Entrez\__init__.py", line 334, in esummary
    return _open(cgi, variables)
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\site-packages\Bio\Entrez\__init__.py", line 569, in _open
    handle = _urlopen(cgi)
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 543, in _open
    '_open', req)
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 1362, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 1319, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\http\client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\http\client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\http\client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\http\client.py", line 1026, in _send_output
    self.send(msg)
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\http\client.py", line 966, in send
    self.connect()
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\http\client.py", line 1422, in connect
    server_hostname=server_hostname)
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\ssl.py", line 423, in wrap_socket
    session=session
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\ssl.py", line 870, in _create
    self.do_handshake()
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python37\lib\ssl.py", line 1139, in do_handshake
    self._sslobj.do_handshake()
KeyboardInterrupt

where the rest of the output has been interrupted by me, because I don't want it to run on the whole data, as it is printing it in the incorrect form. As you can see with the example of Spondias pinnata and flatulence, you can see it is printing the different DOI URLs in different lines. The problem is I don't want it to print like that, because it will be extremely difficult to put it back into the original data. This CSV file, for example, has only 65 entries, but there are datasets with more than 8000 entries, making it a very difficult job. The output I wish to achieve, should for example, look like this (when we consider the aforementioned plant-disease pair):

Spondias pinnata, Flatulence, http://doi.org/10.1016/j.heliyon.2019.e02768, http://doi.org/10.1186/s13002-019-0287-2, http://doi.org/10.1186/s13002-018-0248-1, http://doi.org/10.3897/phytokeys.102.24380, http://doi.org/10.1155/2018/5382904, http://doi.org/10.1186/s13002-016-0089-8, http://doi.org/10.1186/s13002-015-0033-3, http://doi.org/10.1186/1472-6882-13-243, http://doi.org/10.1186/1472-6882-10-77

Someone from my family suggested that I use a nested dictionary, but I don't see how/if that would help, and I have no idea where to place it in the code, and what changes to make to the already heavily nested loops. Any help with this would be greatly appreciated. Thank you.

Using BioPython, How To Print DOI References In A Single Line (Comma-Delimited) For A Given Pair of Search Terms, Instead Of In Multiple Lines?

Answers (1)

Related Questions