splintercell
splintercell

Reputation: 575

unicoded python string '' not returning empty

Not sure if this is a rookie mistake or plain stupid, but I am facing this strange issue. I have a unicoded string declared as classifier = u"''" which I am checking for emptiness. The following code block:

if classifier: 
   # do something
else:
   # else do something else

will hit the else block since there is '' embedded. I don't have control over the source generating classifier string.

Only if classifier can somehow be operated to return the embedded '' I can check for emptiness of classifier, but not sure how. If it is of any help classifier is collected from HttpRequest object classifier = request.GET.get('c', '').

EDIT:

classifier[1:-1] returns u'' which now can be checked for emptiness. Any built in method which one can use?

I will go ahead with this approach for now. But leaving the post open for any other advanced pointers if any.

thanks,

Upvotes: 0

Views: 94

Answers (3)

abarnert
abarnert

Reputation: 365925

You have to actually know what the data means before you can decide how to parse it. Just randomly hacking at it until it works for one example isn't going to help.

So, you're getting the string out of a URL, and it looks like this:

http:///a=maven&v=1.1.0&classifier=''&ttype=pom

Normally, when given a URL, the right thing to do is call urlparse.urlparse and then call urlparse.parse_qs on the query. But that won't actually help here, because this is not actually a valid URL.

Well, it is a valid URL, but it's one with a path <someurl>/a=maven&v=1.1.0&classifier=''&ttype=pom, not one with a path <someurl>/ and a query a=maven&v=1.1.0&classifier=''&ttype=pom. You need a ? to set off the query.

And, on top of that, the query is clearly not generated correctly. You don't quote empty strings in a query. You don't quote anything (you entity-escape ampersands and percent-escape any other special characters). So, unless the URL literally means that the classifier is '' rather than the empty string, it's wrong.

And, if it weren't wrong, you wouldn't be asking these questions.

If you have any control over how these URLs are getting generated, obviously you want to get that fixed. If you can't control it, but at least know how they're being generated, you can write code to reverse that to get the original values. But if you don't even know that, you have to guess.

You ideally need more than one example to guess. Are they quoting just empty strings, or are they also, e.g., quoting strings with " characters or spaces or ampersands in them? If it's the latter, you can probably just strip("'"), but if it's the former, that will be incorrect in any cases where the original data actually has quotes.

Upvotes: 1

XORcist
XORcist

Reputation: 4367

if len(classifier) > 2:
    # do something
else:
    # do something else

Upvotes: 1

dwikle
dwikle

Reputation: 6978

You could do this:

if classifier.strip("'"): 
   # do something
else:
   # else do something else

Upvotes: 2

Related Questions