Youssef
Youssef

Reputation: 1495

Is there a way to access search results from PyPI without scraping?

I'm working on a GUI for managing Python virtual environments. So far I was able to implement the most features I wanted to provide to the user. But I'm stuck with one thing:

While creating a virtual environment the users can install packages into it if they want. For this I would like to let them perform a search like pip search <package> from the command line. The results will be displayed in a table view. The problem I have is that I'm not sure what is the best way to get the search results.

I tried using the built-in module subprocess and did a pip search to populate the table with the results. This is possible, however it's quit tricky, because I have to format the output first (package name, version, description) to fit the table.

As because for this is a lot of nested loops and string manipulation needed, I looked for a way to access the data directly, ideally without having to scrape Python Package Index.


EDIT:

I considered using PyPI's XML-RPC API, but there's a note that it's going to be deprecated in the future and is not recommended for use, so I'm unsure if I should use it in my project.

The XML-RPC API will be deprecated in the future. Use of this API is not recommended, and existing consumers of the API should migrate to the RSS and/or JSON APIs instead.

Users of this API are strongly encouraged to subscribe to the pypi-announce mailing list for notices as we begin the process of removing XML-RPC from PyPI.

Is there another way to get the search results from PyPI or is the XML-RPC API the only one at the moment?

Upvotes: 4

Views: 2169

Answers (2)

LoneWanderer
LoneWanderer

Reputation: 3341

To emphasize on Antti Haapala anwser :

  • As of jan. 2020, https://status.python.org provides some meaningful informations concerning ongoing pip errors. see quotes below.
  • Any person doing a pip search (tested with pip 20.3.3) command may be confronted to the following error message: xmlrpc.client.Fault: <Fault -32500: "RuntimeError: PyPI's XMLRPC API has been temporarily disabled due to unmanageable load and will be deprecated in the near future. See https://status.python.org/ for more information.">
  • pip install your_package still works

see also https://stackoverflow.com/a/65485498/7237062


An important part of the above error mesage being (giant bold emphasis mine):

PyPI's XMLRPC API [...] will be deprecated in the near future


Quoting https://status.python.org: (I do not intend to update this post any further, just provide some context.)

Update - We are continuing to monitor for any further issues.

Dec 28, 13:51 UTC

Update - The XMLRPC Search endpoint remains disabled due to ongoing request volume. As of this update, there has been no reduction in inbound traffic to the endpoint from abusive IPs and we are unable to re-enable the endpoint, as it would immediately cause PyPI service to degrade again.

Dec 28, 13:50 UTC

Update - The XMLRPC Search endpoint is still disabled due to ongoing request volume. As of this update, there has been no reduction in inbound traffic to the endpoint from abusive IPs and we are unable to re-enable the endpoint, as it would immediately cause PyPI service to degrade again. We are working with the abuse contact at the owner of the IPs and trying to make contact with the maintainers of whatever tool is flooding us via other channels.

Dec 23, 14:54 UTC

Update - The XMLRPC Search endpoint is still disabled due to ongoing request volume. As of this update, there has been no reduction in inbound traffic to the endpoint from abusive IPs and we are unable to re-enable the endpoint, as it would immediately cause PyPI service to degrade again. We are working with the abuse contact at the owner of the IPs and trying to make contact with the maintainers of whatever tool is flooding us via other channels.

Dec 15, 20:59 UTC

Monitoring - With the temporary disabling of XMLRPC we are hoping that the mass consumer that is causing us trouble will make contact. Due to the huge swath of IPs we were unable to make a more targeted block without risking more severe disruption, and were not able to receive a response from their abuse contact or direct outreach in an actionable time frame.

Dec 14, 17:46 UTC

Update - Due to the overwhelming surges of inbound XMLRPC search requests (and growing) we will be temporarily disabling the XMLRPC search endpoint until further notice.

Dec 14, 17:30 UTC

Identified - We've identified that the issue is with excess volume to our XLMRPC search endpoint that powers pip search among other tools. We are working to try to identify patterns and prohibit abusive clients to retain service health.

Dec 14, 15:09 UTC

Investigating - PyPI's search backends are experiencing an outage causing the backends to timeout and fail, leading to degradation of service for the web app. Uploads and installs are currently unaffected but logged in actions and search via the web app and API access via XMLRPC are currently experiencing partial outages.

Dec 14, 09:41 UTC

Upvotes: 3

The XML-RPC Search endpoint was temporarily disabled in mid-December 2020, because of ever increasing request load to the search endpoint. As of now, it is not currently possible to search packages on pypi.org with an API at all.

Upvotes: 7

Related Questions