Jack Thor
Jack Thor

Reputation: 1604

Regex removing empty spaces when using replace

My situation is not about removing empty spaces, but keeping them. I have this string >[database values] which I would like to find. I created this RegEx to find it then go in and remove the >, [, ]. The code below takes a string that is from a document. The first pattern looks for anything that is surrounded by >[some stuff] it then goes in and "removes" >, [, ]

  string decoded = "document in string format";
  string pattern = @">\[[A-z, /, \s]*\]";
  string pattern2 = @"[>, \[, \]]"; 
  Regex rgx = new Regex(pattern);
  Regex rgx2 = new Regex(pattern2);         
  foreach (Match match in rgx.Matches(decoded))
  {                     
    string replacedValue= rgx2.Replace(match.Value, "");
    Console.WriteLine(match.Value);
    Console.WriteLine(replacedValue);

What I am getting in first my Console.WriteLine is correct. So I would be getting things like >[123 sesame St]. But my second output shows that my replace removes not just the characters but the spaces so I would get something like this 123sesameSt. I don't see any space being replaced in my Regex. Am I forgetting something, perhaps it is implicitly in a replace?

Upvotes: 1

Views: 135

Answers (3)

participant
participant

Reputation: 3013

By defining [>, \[, \]] in pattern2 you define a character group consisting of single characters like >, ,, , [ and every other character you listed in the square brackets. But I guess you don't want to match space and ,. So if you don't want to match them leave them out like

string pattern2 = @"[>\[\]]";

Alternatively, you could use

string pattern2 = @"(>\[|\])";

Thereby, you either match >[ or ] which better expresses your intention.

Upvotes: 1

ScoJo
ScoJo

Reputation: 91

The [A-z, /, \s] and [>, \[, \]] in your patterns are also looking for commas and spaces. Just list the characters without delimiting them, like this: [A-Za-z/\s]

string pattern = @">\[[A-Za-z/\s]*\]";
string pattern2 = @"[>,\[\]]";

Edit to include Casimir's tip.

Upvotes: 3

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89629

After rereading your question (if I understand well) I realize that your two steps approach is useless. You only need one replacement using a capture group:

string pattern = @">\[([^]]*)]";
Regex rgx = new Regex(pattern);

string result = rgx.Replace(yourtext, "$1");

pattern details:

>\[         # literals: >[
(           # open the capture group 1
    [^]]*   # all that is not a ]
)           # close the capture group 1
]           # literal ]

the replacement string refers to the capture group 1 with $1

Upvotes: 1

Related Questions