Reputation: 27
I'm really bad with regular expressions and find them to be too complex. However, I need to use them to do some string manipulation in classic asp.
Input String :
"James John Junior
S.D. Industrial Corpn
D-2341, Focal Point, Phase 4-a,
Sarsona, Penns
Japan
Phone : 92-161-4633248 Fax : 92-161-253214
email : [email protected]"
Desired Output string:
"JXXXX JXXX JXXXXX
S.X. IXXXXXXXXX CXXXX
D-XXXX, FXXXX PXXXX, PXXXX 4-X,
SXXXXXX, PXXXX
JXXXX
PXXXX : 9X-XXX-XXXXXXX Fax : 9X-XXX-XXXXXX
eXXXX : [email protected]"
Note: We need to split the original string into words based on a single space Then, in those words, we need to replace all letters (lower and upper case) and numbers except for the first character in each word with an "X"
I know its sort of difficult, but a seasoned RegEx expert could nail this pretty easily I would think. No?
Edit:
I've made some progress. Found a function (http://www.addedbytes.com/lab/vbscript-regular-expressions/) that sort of does the job. But needs a little refinement, if anyone can help
function ereg_replace(strOriginalString, strPattern, strReplacement, varIgnoreCase)
' Function replaces pattern with replacement
' varIgnoreCase must be TRUE (match is case insensitive) or FALSE (match is case sensitive)
dim objRegExp : set objRegExp = new RegExp
with objRegExp
.Pattern = strPattern
.IgnoreCase = varIgnoreCase
.Global = True
end with
ereg_replace = objRegExp.replace(strOriginalString, strReplacement)
set objRegExp = nothing
end function
Im calling it like so -
orgstr = ereg_replace(orgstr, "\w", "X", True)
However, the result looks like -
XXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXX.
XX, XXXXX XXXX, XXXXXX XXXXXX, XXXXXXX XXXXXXX, XXXXXXXXX
XXXXX : XXX-XXX-XXXX
XXX :
XXXXX : [email protected]
I'd like this to show the first character in every word. Any help out there?
Upvotes: 1
Views: 944
Reputation: 189505
This approach gets close:
Function AnonymiseWord(m, p, s)
AnonymiseWord = Left(m, 1) & String(Len(m) - 1, "X")
End Function
Function AnonymiseText(input)
Dim rgx: Set rgx = new RegExp
rgx.Global = True
rgx.Pattern = "\b\w+?\b"
AnonymiseText = rgx.Replace(input, GetRef("AnonymiseWord"))
End Function
This might get you close enough to what you need otherwise the basic approach is sound but you may need to fiddle with that pattern to get it match exactly the stretches of text you want to put through AnonymiseWord
.
Upvotes: 2
Reputation: 7490
Although I love regular expressions, you could do it without them, especially because VBScript does not support look behind.
Dim mystring, myArray, newString, i, j
Const forbiddenChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
myString = "James John Junior S.D. Industrial Corpn D-2341, Focal Point, Phase 4-a, Sarsona, Penns Japan Phone : 92-161-4633248 Fax : 92-161-253214 email : [email protected]"
myArray = split(myString, " ")
For i = lbound(myArray) to ubound(myArray)
newString = left(myArray(i), 1)
For j = 2 to len(myArray(i))
If instr(forbiddenChars, mid(myArray(i), j, 1)) > 0 Then
newString = newString & "X"
else
newString = newString & mid(myArray(i), j, 1)
End If
Next
myArray(i) = newString
Next
myString = join(myArray, " ")
It doesn't cope with the VbNewLine character, but you will get the idea. You can do an extra split on the VbNewLine character, iterate through all elements and split each element on the space for example.
Upvotes: 0
Reputation: 93026
I have no idea about classic ASP, but if it does support (negative) lookbehinds and the only problem is the quantifier in the lookbehind, then why not turn it around and do it this way:
(?<!^)(?<!\s)[a-zA-Z0-9]
and replace with "X".
Means, replace every letter and number if there is not a whitespace or not the start of the string/row before.
See it here on Regexr
Upvotes: 1
Reputation: 336418
Well, in .NET it would be easy:
resultString = Regex.Replace(subjectString,
@"(?<= # Assert that there is before the current position...
\b # a word boundary
\w # one alphanumeric character (= first letter/digit/underscore)
[\w.@-]* # any number of alnum characters or ., @ or -
) # End of lookbehind
[\p{L}\p{N}] # Match any letter or digit to be replaced",
"X", RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
The result, though, would be slightly different than what you wrote:
"JXXXX JXXX JXXXXX
S.X. IXXXXXXXXX CXXXX
D-XXXX, FXXXX PXXXX, PXXXX 4-X,
SXXXXXX, PXXXX
JXXXX
PXXXX : 9X-XXX-XXXXXXX FXX : 9X-XXX-XXXXXX
eXXXX : [email protected]"
(observe that Fax
has also been changed to FXX
)
Without .NET, you could try something like
orgstr = ereg_replace("\b(\w)[\w.@-]*", "\1XXXX", True); // not sure about the syntax here, you possibly need double backslashes
which would give you
"JXXXX JXXXX JXXXX
SXXXX IXXXX CXXXX
DXXXX, FXXXX PXXXX, PXXXX 4XXXX,
SXXXX, PXXXX
JXXXX
PXXXX : 9XXXX FXXXX : 9XXXX
eXXXX : sXXXX"
You won't get it better than that with a single regex.
Upvotes: 1