Ashish Sharma
Ashish Sharma

Reputation: 331

Regex for finding float numbers in a comma separated string

I am have a csv file that has comma separated line something like this:

"22.0,asc1.0$*xyz,12.00,12.00023000,,1.0asc1.0,,-12.4442,0.00000,5.2E-5\n"

I want to find all floating point numbers only.

Challenges:

This expression correctly captures the float part but doesn't work with above scenarios correctly:

[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?

Thanks in advance!

Complete Code in python (based on the feedback below):

import pandas as pd
from io import StringIO

data = {"a": 1.0, "b": "xyz.2.0", "c": "lol"}

buffer = StringIO()

df = pd.DataFrame([data])
df.to_csv(buffer, index=False)
input_ = buffer.getvalue()
print(input_)

import re

expression = "(?<=,|^)[+-]?\d+\.?\d*(?:E[+-]?\d+)?(?=,|$)"

def transform_float(m):
    value = m.group()
    print(value)
    return value

result = re.sub(expression, transform_float, input_)
print(result)

Upvotes: 1

Views: 1478

Answers (1)

anubhava
anubhava

Reputation: 785156

You may use this regex with look arounds:

(?:(?<=,)|(?<=^))[+-]?\d+\.?\d*(?:E[+-]?\d+)?(?=,|$)

RegEx Demo

RegEx Description:

  • (?:(?<=,)|(?<=^)): Lookbehind to assert that we have a comma or line start at previous position
  • [+-]?\d+\.?\d*(?:E[+-]?\d+)?: Match an integer or floating point number with optional E part
  • (?=,|$): Lookahead to assert that we have a comma or line end ahead

Upvotes: 2

Related Questions