Enrique Bruzual
Enrique Bruzual

Reputation: 493

Python regular expression - extracting float pattern

I am trying to extract a particular "float" from a string, it contains multiple formatted "integers", "floats" and dates. The particular "float" in question is presided by some standardized text.

String sample

my_string = """03/14/2019 07:07 AM
💵Soles in mDm : 2864.35⬇
🔶BTC purchase in mdm: 11,202,782.0⬇
"""

I have been able to extract the desired float pattern for, 2864.35, from my_string but if this particular float changes in pattern or another float with the same format shows up, my script won't return the desired result

regex = r"(\d+\.\d+)"
matches = re.findall(regex, my_string)
for match in matches:
    print(match)

Desired return from regular expression regex

Some variances of desired float in the Second line of the string only

What you see bellow are three examples of the same line, the second line in my_string. The regex should be able to return only line number two despite any variations such as soles or Soles

Any assistance in editing or re-writing the current regular expression regex is greatly appreciated

Upvotes: 2

Views: 355

Answers (2)

FailSafe
FailSafe

Reputation: 482

EDIT - Hmmm... If it has to follow soles then hopefully this helps

Try these, granted my console can't take the extra characters, but based on your input:

>>> my_string = """03/14/2019 07:07 AM
Soles in mDm : 2864.35
BTC purchase in mdm: 11,202,782.0
Soles in mDm : 2864.35
soles MDM: 2,864.35
Soles in mdm :2,864.355
"""


>>> re.findall('(?i)soles[\S\s]*?([\d]+[\d,]*\.[\d]+)', my_string)

#Output
['2864.35', '2864.35', '2,864.35', '2,864.355']



>>> re.findall('[S|s]oles[\S\s]*?([\d]+[\d,]*\.[\d]+)', my_string)

#Output
['2864.35', '2864.35', '2,864.35', '2,864.355']

Upvotes: 2

A l w a y s S u n n y
A l w a y s S u n n y

Reputation: 38502

If you want to match multiple instances then just add the g flag other wise it'll only match the single instance. REGEX

(?<=:)\s?([\d,]*\.\d+)

With Python,

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?<=:)\s?([\d,]*\.\d+)"

test_str = ("\n"
    "    💵Soles in mDm : 2864.35⬇\n"
    "    soles MDM: 2,864.35\n"
    "    Soles in mdm :2,864.355\n")

matches = re.search(regex, test_str, re.IGNORECASE)

if matches:
    print ("Match was found at {start}-{end}: {match}".format(start = matches.start(), end = matches.end(), match = matches.group()))

    for groupNum in range(0, len(matches.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = matches.start(groupNum), end = matches.end(groupNum), group = matches.group(groupNum)))

Upvotes: 0

Related Questions