Reputation: 484
I'm working with the following string:
'"name": "Gnosis", \n "symbol": "GNO", \n "rank": "99", \n "price_usd": "175.029", \n "price_btc": "0.0186887", \n "24h_volume_usd": "753877.0"'
and I have to use re.sub()
in python to replace only the double quotes ("
) that are enclosing the numbers, in order to parse it later in JSON. I've tried with some regular expressions, but without success. Here is my best attempt:
exp = re.compile(r': (")\D+\.*\D*(")', re.MULTILINE)
response = re.sub(exp, "", string)
I've searched a lot for a similar problem, but have not found another similar question.
Finally I've used (thanks to S. Kablar):
fomatted = re.sub(r'"(-*\d+(?:\.\d+)?)"', r"\1", string)
parsed = json.loads(formatted)
The problem is that this endpoint returns a bad formatted string as JSON.
Other users answered "Parse the string first with json, and later convert numbers to float" with a for loop and, I think, is a very inneficient way to do it, also, you will be forced to select between int or float type for your response. To get out of doubt, I've wrote this gist where I show you the comparations between the different approachs with benchmarking, and for now I'm going to trust in regex in this case.
Thanks everyone for your help
Upvotes: 6
Views: 6251
Reputation: 3405
Regex: "(-?\d+(?:[\.,]\d+)?)"
Substitution: \1
Details:
()
Capturing group(?:)
Non capturing group\d
Matches a digit (equal to [0-9]
)+
Matches between one and unlimited times?
Matches between zero and one times\1
Group 1.Python code:
def remove_quotes(text):
return re.sub(r"\"(-?\d+(?:[\.,]\d+)?)\"", r'\1', text)
remove_quotes('"percent_change_7d": "-23.43"') >> "percent_change_7d": -23.43
Upvotes: 8
Reputation: 42788
Parse the string first with json, and later convert numbers to floats:
string = '{"name": "Gnosis", \n "symbol": "GNO", \n "rank": "99", \n "price_usd": "175.029", \n "price_btc": "0.0186887", \n "24h_volume_usd": "753877.0"}'
data = json.loads(string)
response = {}
for key, value in data.items():
try:
value = int(value) if value.strip().isdigit() else float(value)
except ValueError:
pass
response[key] = value
Upvotes: 2
Reputation: 57463
You came close. You want to save the numbers, and the colon, so you need to put them in parentheses, not the rest. Also, numbers are \d
, not \D
(that would be not-numbers).
So:
exp = re.compile(r'(: *)"(\d+\.?\d*)"', re.MULTILINE)
response = re.sub(exp, "\\1\\2", string)
\d+\.?\d* means "a number (or more), a point (or not), any numbers"
The above doesn't cover ".125", which is no numbers, one point.
And if you changed to "\d*.?\d*", that would match ".", since it is **any numbers", one point, any numbers".
I think the only practicable way is
(\d+\.?\d*|\.\d+)
with | meaning "or": so, either a number optionally followed by one point and any digits (this matches "17."), or a point followed by at least one digit. Unfortunately, "\d+.?\d+" does not match "5".
Or you specify all three cases:
(\d+|\d+\.?\d+|\.\d+)
First integers (\d+), then floating points with or without decimals, then decimal parts alone without leading zeroes.
Upvotes: 1