Let's do this
Let's do this

Reputation: 11

C# [regex] trim spaces before specific word

I want to trim all spaces between numbers before words "usd" and "eur". I have regex pattern like this:

@"\b(\d\s*)+\s(usd|eur)"

How to exclude space and usd|eur from result match?.

String example: "sdklfjsd 10 343 usd ds 232 300 eur"

Result should be: "sdklfjsd 10343 usd ds 232300 eur"

string line = "2 300 $ 12 Asdsfd 2  300  530 usd and 2  351 eur";
        MatchCollection matches;
        Regex defaultRegex = new Regex(@"\b(\d+\s*)+(usd|eur)");        
        matches = defaultRegex.Matches(line);
        WriteLine("Parsing '{0}'", line);
        for (int ctr = 0; ctr < matches.Count; ctr++)
            WriteLine("={0}){1}", ctr, matches[ctr].Value);

Upvotes: 0

Views: 410

Answers (4)

The fourth bird
The fourth bird

Reputation: 163207

You could also use a posotive lookbehind and a positive lookahead to match all the spaces you want to remove:

(?<=\d)\s+(?=(?:\d+\s+)*\d+\s+(?:eur|usd)\b)

Explanation

  • (?<=\d) Positive lookbehind to assert what is on the left is
  • \s+ Match 1+ whitespace characters
  • (?= Positive lookahead to assert what is on the right is
    • (?:\d+\s+)* Repeat 0+ times matching 1+ digits followed by 1+ whitespace characters
    • \d+\s+(?:eur|usd)\b match 1+ digits followed by 1+ whitespace characters and eur or usd
  • ) Close positive lookahead

Regex demo

string line = "2 300 $ 12 Asdsfd 2  300  530 usd and 2  351 eur";
string result = Regex.Replace(line , @"(?<=\d)\s+(?=(?:\d+\s+)*\d+\s+(?:eur|usd)\b)", "");
Console.WriteLine(result); // 2 300 $ 12 Asdsfd 2300530 usd and 2351 eur

Demo C#

Upvotes: 0

Jon Hanna
Jon Hanna

Reputation: 113232

There my be a more eloquent way, but it can be done easily with a MatchEvaluator

new Regex(@"\b(\d+\s*)+(?=\s(usd|eur))").
    Replace("sdklfjsd 10   343  usd ds 232 300 eur",
        m => string.Join("", m.Groups[1].Captures.Cast<Capture>().Select(c => c.Value.Trim())))

The Regex \b(\d+\s*)+(?=\s(usd|eur)) uses a look-ahead to only match numbers that are followed by \s(usd|eur) and a grouping to match each consecutive match to \d+\s* (I assume the \b boundary from your question so that with abc12 34 56 eur it would only match 34 56 is desired, remove it otherwise).

Then for each match it gets all of that group's captures, trims them all, and concatenates them together to produce the replacement text.

(Note that generally currency codes should be capitalised, so you my have another issue there).

Upvotes: 1

Yuriy Faktorovich
Yuriy Faktorovich

Reputation: 68667

Assuming there only two numbers, you can use

\b(\d+)\s*(\d+)(?=\s(usd|eur)) with a replacement string of $1$2

Upvotes: 1

Matt.G
Matt.G

Reputation: 3609

Try Regex: (\d+) *(\d+)(?= (?:usd|eur))

Demo

Upvotes: 1

Related Questions