rocky
rocky

Reputation: 7098

Converting Python complex string output like (-0-0j) into an equivalent complex string

In Python, I'd like a good way to convert its complex number string output into an equivalent string representation which, when interpreted by Python, gives the same value.

Basically I'd like function complexStr2str(s: str): str that has the property that eval(complexStr2str(str(c))) is indistinguishable from c, for any c whose value is of type complex. However complexStr2str() only has to deal with the kinds of string patterns that str() or repr() output for complex values. Note that for complex values str() and repr() do the same thing.

By "indistinguishable" I don't mean == in the Python sense; you can define (or redefine) that to mean anything you want; "indistinguishable" means that if you have string a in a program which represents some value, and replace that in the program with string b (which could be exactly a), then there is no way to tell the difference between the running of the Python program and the replacement program, short of introspection of the program .

Note that (-0-0j) is not the same thing as -0j although the former is what Python will output for str(-0j) or repr(-0j). As shown in the interactive session below, -0j has real and imaginary float parts -0.0 while -0-0j has real and imaginary float parts positive 0.0.

The problem is made even more difficult in the presence of values like nan and inf. Although in Python 3.5+ ish you can import these values from math, for various reasons, I'd like to avoid having to do that. However using float("nan") is okay.

Consider this Python session:

>>> -0j
(-0-0j)
>>> -0j.imag
-0.0
>>> -0j.real
-0.0
>>> (-0-0j).imag
0.0  # this is not -0.0
>>> (-0-0j).real
0.0  # this is also not -0.0
>>> eval("-0-0j")
0j # and so this is -0j
>>> atan2(-0.0, -1.0)
-3.141592653589793
>>> atan2((-0-0j).imag, -1.0)
3.141592653589793
>>> -1e500j
(-0-infj)
>>> (-0-infj)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'infj' is not defined

Addendum:

This question has generated something of a stir (e.g. there are a number of downvotes for this question and its accepted solution). And there have been a lot of edits to the question, so some of the comments might be out of date.

The main thrust of the criticism is that one shouldn't want to do this. Parsing data from text from some existing program is a thing that happens all the time, and sometimes you just can't control the program that generated the data.

A related problem where one can control the outputter program but one needs to have it appear in text, is to write a better repr() function that works better for floats and complex numbers and follows the principle described at the end. It is straightforward to do that, even if it is a little ugly because to do it fully you also need to handle float/complex in composite types like lists, tuples, sets, and dictionaries.

Finally, I'll say that it appears that Python's str() or repr() output for complex values is unhelpful, which is why this problem is more specific to Python than other languages that support complex numbers as a primitive datatype or via a library.

Here is a session that shows this:

>>> complex(-0.0, -0.0)
(-0-0j)  # confusing and can lead to problems if eval'd
>>> repr(complex(-0.0, -0.0))
'(-0-0j)' # 'complex(-0.0, -0.0)' would be the simplest, clearest, and most useful

Note that str() gets called when doing output such as via print(). repr() is the preferred method for this kind of use but here it is the same as str() and both have problems with things like inf and nan.

For any built-in type (eval(repr(c)) should be indistinguisable from c.

Upvotes: 2

Views: 1608

Answers (3)

wim
wim

Reputation: 363043

This question is based on false premise. To correctly preserve signed zeros, nan, and infinity when using complex numbers, you should use the function call rather than binops:

complex(real, imag)

It should be called with two floats:

>>> complex(-0., -0.)  # correct usage
(-0-0j)
>>> complex(-0, -0j)  # incorrect usage
-0j

Your problem with attempting to use eval the literals is that -0-0j is not actually a complex literal. It is a binary op, subtraction of an integer 0 with a complex 0j. The integer first had a unary sub applied, but that was a no-op for the integer zero.

The parser will reveal this:

>>> ast.dump(ast.parse("-0-0j"))
'Module(body=[Expr(value=BinOp(left=UnaryOp(op=USub(), operand=Constant(value=0, kind=None)), op=Sub(), right=Constant(value=0j, kind=None)))], type_ignores=[])'

Python's choices here will make more sense if you understand how the tokenizer works, it does not want to backtrack:

$ echo "-0-0j" > wtf.py
$ python -m tokenize wtf.py
0,0-0,0:            ENCODING       'utf-8'        
1,0-1,1:            OP             '-'            
1,1-1,2:            NUMBER         '0'            
1,2-1,3:            OP             '-'            
1,3-1,5:            NUMBER         '0j'           
1,5-1,6:            NEWLINE        '\n'           
2,0-2,0:            ENDMARKER      ''

But you can reason it yourself easily too, from the datamodel hooks and operator precedence:

>>> -0-0j  # this result seems weird at first
0j
>>> -(0) - (0j)  # but it's parsed like this
0j
>>> (0) - (0j)  # unary op (0).__neg__() applies first, does nothing
0j
>>> (0).__sub__(0j)  # left-hand side asked to handle first, but opts out
NotImplemented
>>> (0j).__rsub__(0)  # right-hand side gets second shot, reflected op works
0j

The same reasoning applies to -0j, it's actually a negation, and the real part is implicitly negated too:

>>> -0j  # where did the negative zero real part come from?
(-0-0j)
>>> -(0j)  # actually parsed like this
(-0-0j)
>>> (0j).__neg__()  # so *both* real and imag parts are negated
(-0-0j)

Let's talk about this part, it's pointing the blame in the wrong direction:

Python's str() representation for complex numbers with negative real and imaginary parts is unhelpful

No, there is nothing incorrect about the implementation of __str__ here, and your use of complex(-0,-0j) makes me suspect you didn't fully understand what's going on in the first place. Firstly, there is never reason to write -0 because there is no signed zero for integers, only floats. And that imaginary part -0j is still parsed as a USub on a complex as I've explained above. Usually you wouldn't pass an imaginary number itself as the imaginary part here, the right way to call complex is just with two floats: complex(-0., -0.). No surprises here.

Whilst I'll agree that the parsing/eval of complex expressions is counter-intuitive, I have disagree that there is anything amiss in their string representation. The suggestion to "improve" on the eval of expressions may be possible, with the goal of making eval(repr(c)) round-trip exactly - but it will mean that you can not use Python's left-to-right munching parser any more. That parser is fast, simple, and easy to explain. It is not a fair trade-off to greatly complicate the parse trees for the purpose of making expressions involving complex zeros behave less strangely, when nobody who needs to care about such details should be choosing repr(c) as their serialization format in the first place.

Note that ast.literal_eval only allows it as a convenience. ast.literal_eval("0+0j") will work despite not being a literal, and the other way around will fail:

>>> ast.literal_eval("0+0j")
0j
>>> ast.literal_eval("0j+0")
ValueError: malformed node or string: <_ast.BinOp object at 0xcafeb4be>

In conclusion, the string representation of complex numbers is fine. It's the way that you create the numbers that matters. str(c) is intended for human readable output, use a machine-friendly serialization format if you care about preserving signed zeros, nan, and infinities.

Upvotes: 8

kaya3
kaya3

Reputation: 51083

As @wim has noted in the comments, this is probably not the right solution to the real problem; it would be better to not have converted those complex numbers to strings via str in the first place. It's also quite unusual to care about the difference between positive and negative zero. But I can imagine rare situations where you do care about that difference, and getting access to the complex numbers before they get str()'d isn't an option; so here's a direct answer.

We can match the parts with a regex; [+-]?(?:(?:[0-9.]|[eE][+-]?)+|nan|inf) is a bit loose for matching floating point numbers, but it will do. We need to use str(float(...)) on the matched parts to make sure they are safe as floating point strings; so e.g. '-0' gets mapped to '-0.0'. We also need special cases for infinity and NaN, so they are mapped to the executable Python code "float('...')" which will produce the right values.

import re

FLOAT_REGEX = r'[+-]?(?:(?:[0-9.]|[eE][+-]?)+|nan|inf)'
COMPLEX_PATTERN = re.compile(r'^\(?(' + FLOAT_REGEX + r'\b)?(?:(' + FLOAT_REGEX + r')j)?\)?$')

def complexStr2str(s):
    m = COMPLEX_PATTERN.match(s)
    if not m:
        raise ValueError('Invalid complex literal: ' + s)

    def safe_float(t):
        t = str(float(0 if t is None else t))
        if t in ('inf', '-inf', 'nan'):
            t = "float('" + t + "')"
        return t

    real, imag = m.group(1), m.group(2)
    return 'complex({0}, {1})'.format(safe_float(real), safe_float(imag))

Example:

>>> complexStr2str(str(complex(0.0, 0.0)))
'complex(0.0, 0.0)'
>>> complexStr2str(str(complex(-0.0, 0.0)))
'complex(-0.0, 0.0)'
>>> complexStr2str(str(complex(0.0, -0.0)))
'complex(0.0, -0.0)'
>>> complexStr2str(str(complex(-0.0, -0.0)))
'complex(-0.0, -0.0)'
>>> complexStr2str(str(complex(float('inf'), float('-inf'))))
"complex(float('inf'), float('-inf'))"
>>> complexStr2str(str(complex(float('nan'), float('nan'))))
"complex(float('nan'), float('nan'))"
>>> complexStr2str(str(complex(1e100, 1e-200)))
'complex(1e+100, 1e-200)'
>>> complexStr2str(str(complex(1e-100, 1e200)))
'complex(1e-100, 1e+200)'

Examples for string inputs:

>>> complexStr2str('100')
'complex(100.0, 0.0)'
>>> complexStr2str('100j')
'complex(0.0, 100.0)'
>>> complexStr2str('-0')
'complex(-0.0, 0.0)'
>>> complexStr2str('-0j')
'complex(0.0, -0.0)'

Upvotes: 1

ParkerD
ParkerD

Reputation: 1390

Because the eval(repr(c)) method doesn't work for complex types, using pickle is the most reliable way to serialize the data:

import pickle


numbers = [
    complex(0.0, 0.0),
    complex(-0.0, 0.0),
    complex(0.0, -0.0),
    complex(-0.0, -0.0),
]
serialized = [pickle.dumps(n) for n in numbers]

for n, s in zip(numbers, serialized):
    print(n, pickle.loads(s))

Output:

0j 0j
(-0+0j) (-0+0j)
-0j -0j
(-0-0j) (-0-0j)

Upvotes: 2

Related Questions