kairu
kairu

Reputation: 11

Python requests irritatingly slow

I made a program to find available /id/ on Steam using requests, but it takes a very long time. If anybody knows any way to make requests faster, please inform me of this.



w = open("not taken.txt", "a")
f = open("og_users.txt", "r")

def is_steam_customurl_taken(id):
    r = requests.get("https://steamcommunity.com/id/%s" % id)
    if ("The specified profile could not be found.".lower() in r.text.lower()):
        return False
    return True

lines = f.readlines()
for line in lines:
    username = line.strip()
    if is_steam_customurl_taken(username):
        print("%s is taken" % username)
    if not is_steam_customurl_taken(username):
        w.write(username)
        w.write("\n")
        print("%s is not taken" % username)
w.close()
f.close()

Upvotes: 1

Views: 138

Answers (2)

dchang
dchang

Reputation: 2027

If you have Steam IDs, see about obtaining a Steam Web API key and use a proper API (some sites have measures to detect and block web-scrapers). Their API has a players endpoint which allows you to submit 100 IDs per request.

If you just have the names tho, try using xml=1 query param (e.g. https://steamcommunity.com/id/eroticgaben?xml=1) for a much lighter response.

Upvotes: 1

alecxe
alecxe

Reputation: 474141

Your bottlenecks here are, basically, two things:

  • network
  • the fact that you are processing usernames one by one synchronously, in a blocking manner. In other words, you are not processing the next username until you are done with the current.

There are a couple of easy wins you can get to improve your current "synchronous" approach:

  • instantiate a requests.Session() and re-use for your network requests. This should speed things up significantly as you are making requests to the same host:

    if you’re making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase

  • do not call is_steam_customurl_taken() twice per single row. Do it once and remember the result into a variable:

    is_taken = is_steam_customurl_taken(username)
    if is_taken:
        print("%s is taken" % username)
    else:
        w.write(username + "\n")
        print("%s is not taken" % username)
    

As far as making things asynchronous and non-blocking, you can look into packages like grequests or Scrapy which would allow you to not wait on the network and process more usernames at a time.

Upvotes: 3

Related Questions