Reputation: 1409
I'm a little too new to RegEx's so this is mostly asking for help with specific pattern matching and a little with how to implement them in C#.
I have a large Excel file full of, amon other things, repeated addresses that are written in different styles. Most are abbreviations of words like Avenue/etc.
For the simple ones I looked up the string.replace() function:
address.Replace("Av ", "Av. ");
And it does the trick there and for some others; but what if I want to replace the word "Ave" I run into the possibility of it being part of another word (some addresses are in Spanish so this is likely to happen). I thought about including whitespaces before and after (" ave ") but would that work if it's the first word in the string? Or should I use a pattern like (this might be wrong too)
^[0-9a-zA-Z_#' ](Ave)\w //the word is **not** preceded by any character other than a whitespace and is followed by a whitespace
For Expressions such as those, I should use something along this pattern, right?
string replacement = "Av.";
Regex rgx = new Regex( ^[0-9a-zA-Z_#' ](Ave)\w);
string result = rgx.Replace(input, replacement);
Thanks
Upvotes: 0
Views: 186
Reputation: 694
Regular expressions have a nifty tool for this which is the \b character class shortcut, it matches on word boundaries, so Ave\b
would only match Ave followed by either a space or a dot or something else that is not a word character.
Read all about the word boundary class here: http://www.regular-expressions.info/wordboundaries.html
BTW, that site is THE place to go to to learn about regular expressions.
Also, if you were to do it in the way you try, it could be something like this: [^\w]Ave[^\s]
That literally is: Not a word character (a-z, A-Z, 0-9 or _), then Ave, then not a space character (tab, space, linebreak etc.).
Also you could use the shorthand for [^\w] and [^\s] which are \W and \S so it would then become \WAve\S
But the \b way is better.
Upvotes: 3
Reputation: 387
Add the word delimiter to your regex,
Regex.Match(content, @"\b(Ave)\b");
Upvotes: 1