Reputation: 2812
I'm new to regular expresions. I have a gigantic text. In the aplication, i need words of 4 characters and delete the rest. The text is in spanish. So far, I can select 4 char length words but i still need to delete the rest.
This is my regular expression
\s(\w{3,3}[a-zA-ZáéíóúäëïöüñÑ])\s
How can i get all words with 4 letters in asp.net vb?
Upvotes: 0
Views: 598
Reputation: 43683
/(?:\A|(?<=\P{L}))(\p{L}{4})(?:(?=\P{L})|\z)/g
Explanation:
Switch /g
is for repeatedly search
\A
is start of the string (not start of line)
\p{L}
matches a single code point in the category letter
\P{L}
matches a single code point not in the category letter
{n}
specify a specific amount of repetition [n is number]
\z
is end of string (not end of line)
|
is logic OR operator
(?<=)
is lookbehind
(?=)
is lookahead
(?:)
is non backreference grouping
()
is backreference grouping
Upvotes: 3
Reputation: 656
Using the character class provided above in another answer (\w does NOT match spanish word characters unfortunately).
You can use this for a match (it matches the reverse, basically matches everything that is NOT a 4-character word, so you can replace with " ", leaving only the 4-character words):
/(^|(?<=(?<=\W)[a-zA-ZáéíóúäëïöüñÑ]{4,4}(?=\W)))(.*?)((?=(?<=\W)[a-zA-ZáéíóúäëïöüñÑ]{4,4}(?=\W))|$)/gis
Approximated code in VB (not tested):
Dim input As String = "This is your text"
Dim pattern As String = "/(^|(?<=(?<=\W)[a-zA-ZáéíóúäëïöüñÑ]{4,4}(?=\W)))(.*?)((?=(?<=\W)[a-zA-ZáéíóúäëïöüñÑ]{4,4}(?=\W))|$)/gis"
Dim replacement As String = " "
Dim rgx As New Regex(pattern)
Dim result As String = rgx.Replace(input, replacement)
Console.WriteLine("Original String: {0}", input)
Console.WriteLine("Replacement String: {0}", result)
You can see the result of the regex in action here:
Upvotes: 2
Reputation: 5865
\[^a-zA-ZáéíóúäëïöüñÑ][a-zA-ZáéíóúäëïöüñÑ]{4}[^a-zA-ZáéíóúäëïöüñÑ]\g
Translated: A non-letter, followed by 4 letters, followed by a non-letter. The 'g' indicated will match globally ... more than once.
Check out this link to find out more info on looping over your matches: http://osherove.com/blog/2003/5/12/practical-parsing-using-groups-in-regular-expressions.html
Upvotes: -2