T T
T T

Reputation: 43

Regular expressions in python

I have following string

WA2ąą-02 -7+12,7. PP-.5P x0.6 words

and I need to count words, number and sum of all number using regular expressions.

Words:

WA2ąą-02
-7+12,7. 
PP-.5P
x0.6
words

Numbers:

2
-2
-7
12
7
-0.5
0.6

Sum of numbers should be 12.1.

I wrote this code, and only word count works well:

import re

string = "WA2ąą-02 -7+12.7. PP-.5P x0.6    word"

#regular expresions
regex1 = r'\S+'
regex2 = r'-?\b\d+(?:[,\.]\d*)?\b'

count_words = len(re.findall(regex1, string))
count_numbers = len(re.findall(regex2, string))
sum_numbers = sum([float(i) for i in re.findall(regex2, string)])

print("\n")
print("String:", string)
print("\n")
print("Count words:", count_words)
print("Count numbers:", count_numbers)
print("Sum numbers:", sum_numbers)
print("\n")
input("Press enter to exit")

Output:

Count words: 5
Count numbers: 4
Sum numbers: 9.7

Upvotes: 4

Views: 81

Answers (2)

rock321987
rock321987

Reputation: 11032

The following regex seems to work fine

([-+]?[\.]?(?=\d)(?:\d*)(?:\.\d+)?)

Python Code

p = re.compile(r'([-+]?[\.]?(?=\d)(?:\d*)(?:\.\d+)?)')
test_str = u"WA2ąą-02 -7+12,7. PP-.5P x0.6 words"
print(sum([float(x) for x in re.findall(p, test_str)]))

Ideone Demo

UPDATE FOR HEX

The following regex seems to work (assuming hex numbers do not have decimal in the string)

([-+]?)(?:0?x)([0-9A-Fa-f]+)

Python Code

p = re.compile(r'([-+]?)(?:0?x)([0-9A-Fa-f]+)')
test_str = u"WA2ąą-02 -7+12,7. -0x1AEfPq PP-.5P 0x1AEf +0x1AEf x0.6 words"

for x in re.findall(p, test_str):
    tmp = x[0] + x[1]
    print(int(tmp, 16))

Ideone Demo

If there is any issue, feel free to comment

Upvotes: 1

Matt Messersmith
Matt Messersmith

Reputation: 13747

I think your regex1 is good to go, it's simple enough.

regex2 = r'[-+]?\d*\.?\d+'

Seems to do the trick (but it's easy to miss edge cases with regex). Optional - or '+', followed by any number of digits, followed by optional ., then match at least one digit.

Regex101 Demo

Upvotes: 2

Related Questions