BaRud
BaRud

Reputation: 3230

cant call curl from python3

I am trying to call this curl from python3. This, from bash, is working fine.

curl -LH "Accept: text/bibliography; style=bibtex" http://dx.doi.org/10.1103/PhysRevLett.117.126802

yielding the expected result:

 @article{Chang_2016, title={Observation of the Quantum Anomalous Hall Insulator to Anderson Insulator Quantum Phase Transition and its Scaling Behavior}, volume={117}, ISSN={1079-7114}, url={http://dx.doi.org/10.1103/PhysRevLett.117.126802}, DOI={10.1103/physrevlett.117.126802}, number={12}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={Chang, Cui-Zu and Zhao, Weiwei and Li, Jian and Jain, J. K. and Liu, Chaoxing and Moodera, Jagadeesh S. and Chan, Moses H. W.}, year={2016}, month={Sep}}

in python3, I am doing:

import subprocess
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
try:
    subprocess.call(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi])
except ExplicitException:
    print("DOI is not available")
    self.Messages.on_warn_clicked("DOI is not given",
                                  "Search google instead")

which is giving error:

<html><body><h1>400 Bad request</h1>
Your browser sent an invalid request.
</body></html>

whats going wrong here?

Upvotes: 1

Views: 267

Answers (1)

Jean-Fran&#231;ois Fabre
Jean-Fran&#231;ois Fabre

Reputation: 140307

You have 3 problems here:

  1. don't quote your arguments in subprocess, it already does that for you when necessary, since you pass the arguments and not the unsplitted command line (good practice, keep it on, but drop the unneccessary quoting).
  2. then, subprocess.call does not allow to parse/store the output in python, which is problematic for number 3:
  3. and last: your site answers with rubbish HTML (java stacktrace) randomly. This explains why you're getting different output in python, but you can get it in bash as well.

Problem #1

subprocess.call(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi])

should be

subprocess.call(["curl", "-LH", 'Accept: text/bibliography; style=bibtex', doi])

Else, quotes are applied twice and your Accept: xxx argument has quotes around it, which is unexpected by curl

demo of the non-working quote part:

import subprocess,os
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"

#### this is wrong because of the quoting ####
p = subprocess.Popen(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi],stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
[output,error] = p.communicate()
print(output)

result:

b' some stats then ... <html><body><h1>400 Bad request</h1>\nYour browser sent an invalid request.\n</body></html>\n\r\n'

Problems #2 and #3

I have implemented a retry mechanism which parses the output and retries until correct output is found:

import subprocess,os,sys
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"

while True:
    p = subprocess.Popen(["curl", "-LH", 'Accept: text/bibliography; style=bibtex', doi],stdout=subprocess.PIPE)
    [output,error] = p.communicate()
    output = output.decode("latin-1")
    if "java.util.concurrent.FutureTask.run" in output:
        # site crashed when responding: junk HTML output: retry
        sys.stderr.write("Wrong answer: retrying\n")
    else:
        print(output)
        break

result:

Wrong answer: retrying   <==== here the site throwed a big HTML exception output
 @article{Chang_2016, title={Observation of the Quantum Anomalous Hall Insulator to Anderson Insulator Quantum Phase Transition and its Scaling Behavior}, volume={117}, ISSN={1079-7114}, url={http://dx.doi.org/10.1103/PhysRevLett.117.126802}, DOI={10.1103/physrevlett.117.126802}, number={12}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={Chang, Cui-Zu and Zhao, Weiwei and Li, Jian and Jain, J.âK. and Liu, Chaoxing and Moodera, Jagadeesh S. and Chan, Moses H.âW.}, year={2016}, month={Sep}}

So it works, it's just a site problem, but with my python wrapper you are able to re-submit the request until it yields the proper answer.

Upvotes: 1

Related Questions