Talib Daryabi
Talib Daryabi

Reputation: 773

I get InvalidURL: URL can't contain control characters when I try to send a request using urllib

I am trying to get a JSON response from the link used as a parameter to the urllib request. but it gives me an error that it can't contain control characters.

how can I solve the issue?

start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
    
source = urllib.request.urlopen(start_url).read()

the error I get is :

URL can't contain control characters. '/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq=' (found at least ' ')

Upvotes: 13

Views: 52153

Answers (8)

A.Prabhu
A.Prabhu

Reputation: 5

Please check the proxy address is there any Blank Space,

Due to blank space, we are getting this error.

Upvotes: -2

skinnedpanda
skinnedpanda

Reputation: 91

Parsing the url first and then encoding the url elements would work.

import urllib.request
from urllib.parse import urlparse, quote

def make_safe_url(url: str) -> str:
    """
    Returns a parsed and quoted url
    """
    _url = urlparse(url)
    url = _url.scheme + "://" + _url.netloc + quote(_url.path) + "?" + quote(_url.query)
    return url

start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
start_url = make_safe_url(start_url)
source = urllib.request.urlopen(start_url).read()

The code returns the JSON-document despite the double forward-slash and the whitespace in the url.

Upvotes: 1

Manuel
Manuel

Reputation: 2542

Replacing whitespace with:

url = url.replace(" ", "%20")

if the problem is with the whitespace.

Upvotes: 28

Kevin
Kevin

Reputation: 100

Solr search strings can get pretty weird. Better use the 'quote' method to encode characters before making the request. See example below:

from urllib.parse import quote

start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
    
source = urllib.request.urlopen(quote(start_url)).read()

Better later than never...

Upvotes: 4

Anderson
Anderson

Reputation: 21

You probably already found out by now but let's get it written here.

There can't be any space character in the URL, and there are 2, after bundle_fq e dm_field_deadlineTo_fq

Remove those and you're good to go

Upvotes: 2

Daweo
Daweo

Reputation: 36660

Spaces are not allowed in URL, I removed them and it seems to be working now:

import urllib.request
start_url = "https://devbusiness.un.org/solr-sitesearch-output/10//0/ds_field_last_updated/desc?bundle_fq =procurement_notice&sm_vid_Institutions_fq=&sm_vid_Procurement_Type_fq=&sm_vid_Countries_fq=&sm_vid_Sectors_fq= &sm_vid_Languages_fq=English&sm_vid_Notice_Type_fq=&deadline_multifield_fq=&ts_field_project_name_fq=&label_fq=&sm_field_db_ref_no__fq=&sm_field_loan_no__fq=&dm_field_deadlineFrom_fq=&dm_field_deadlineTo_fq =&ds_field_future_posting_dateFrom_fq=&ds_field_future_posting_dateTo_fq=&bm_field_individual_consulting_fq="
url = start_url.replace(" ","")
source = urllib.request.urlopen(url).read()

Upvotes: 6

wasif
wasif

Reputation: 15498

You need to encode the control characters inside the URL. Especially spaces need to be encoded to %20.

Upvotes: 0

RMPR
RMPR

Reputation: 3531

Like the error message says, there are some control characters in your url, which doesn't seem to be a valid one by the way.

Upvotes: 0

Related Questions