Reputation: 1126
I need to split an addres lines into 3 parts, each one of different size. In this case: 40, 30 and 30 characters. I want to split input string by spaces so it makes some sense.
Fot this, I wrote a reular expression: (.{1,32})([ ]+.{1,30}){0,1}([ ]+.{1,30}){0,1}
and tried it in this website: https://regex101.com. It worked!
I moved to Visual Studio to write some code to check it out:
public static void TEST()
{
List<string> ok = new List<string>();
List<string> err = new List<string>();
var lista = GetLista();
foreach (string dir in lista)
{
Regex regex = new Regex(@"(.{1,32})([ ]+.{1,30}){0,1}([ ]+.{1,30}){0,1}");
dir = dir.Trim();
GroupCollection cap = regex.Match(dir).Groups;
if (cap.Count == 0) err.Add(dir);
else ok.Add($"{dir};{cap[0].Value};{(cap.Count > 1 ? cap[1].Value.Trim() : "")};{(cap.Count > 2 ? cap[2].Value : "")};{(cap.Count > 3 ? cap[3].Value.Trim() : "")}";);
}
File.WriteAllLines("ok.txt", txt);
File.WriteAllLines("er.txt", err);
}
Results are very differens, it doesn't match at all. Are regex somehow different in C#?
Is there any other way to achive this?
EDITED: Regexs given where differents.
UPDATE: I'll provide an example. Let's take this string: "ERIK ADESIR COMPANY LA ISLA DE LA PALMA".
c# result: "ERIK ADESIR COMPANY LA ISLA DE L";"ERIK ADESIR COMPANY LA ISLA DE L";;
Wanted: "ERIK ADESIR COMPANY LA ISLA DE";"LA PALMA";""
I think the problem is regex is not taking the whole string, just part of it.
Upvotes: 1
Views: 123
Reputation: 626861
You want to match the whole string, so you need to add anchors, ^
and $
. Next, you need to get the captured substrings, not the whole match. Note that the GroupCollection
returns all captured group values and the whole match as the first item. So, you match.Groups
will contain 1 + the number of capturing groups values. You need to ignore the first item.
Also, to check if a group matched, you need to use cap[x].Success
rather than cap.Count > x
.
So, you need a code like this:
foreach (string dir in lista)
{
var match = Regex.Match(dir, @"^(.{1,32})([ ]+.{1,30})?([ ]+.{1,30})?$");
if (match.Success)
{
var cap = match.Groups;
ok.Add($"{dir};{cap[1].Value};{(cap[2].Success ? cap[2].Value.Trim() : "")};{(cap[3].Success ? cap[3].Value : "")}");
}
else
{
err.Add(dir);
}
}
See the C# demo online:
List<string> ok = new List<string>();
List<string> err = new List<string>();
var lista = new[] { "ERIK ADESIR COMPANY LA ISLA DE LA PALMA" };
foreach (string dir in lista)
{
var match = Regex.Match(dir, @"^(.{1,32})([ ]+.{1,30})?([ ]+.{1,30})?$");
if (match.Success)
{
var cap = match.Groups;
ok.Add($"{dir};{cap[1].Value};{(cap[2].Success ? cap[2].Value.Trim() : "")};{(cap[3].Success ? cap[3].Value : "")}");
}
else
{
err.Add(dir);
}
}
Console.WriteLine(string.Join("\n", ok));
Output:
ERIK ADESIR COMPANY LA ISLA DE LA PALMA;ERIK ADESIR COMPANY LA ISLA DE;LA PALMA;
Upvotes: 2
Reputation: 71
The problem could be that your given Regex in c#
new Regex(@"(.{1,40})([ ]+.{1,30}){0,1}([ ]+.{1,30}){0,1}");
is not equals to the one You used as test:
(.{1,32})([ ]+.{1,30}){0,1}([ ]+.{1,30}){0,1}
In c# you start with (.{1,40} but your example is (.{1,32})
Upvotes: 0