Dinesh M
Dinesh M

Reputation: 27

Developing Regular Expression to my needs

I'm really bad with regular expressions and find them to be too complex. However, I need to use them to do some string manipulation in classic asp.

Input String :

"James John Junior 

S.D. Industrial Corpn  
D-2341, Focal Point, Phase 4-a, 
Sarsona, Penns
Japan
Phone : 92-161-4633248 Fax : 92-161-253214
email : [email protected]"

Desired Output string:

"JXXXX JXXX JXXXXX 

S.X. IXXXXXXXXX CXXXX  
D-XXXX, FXXXX PXXXX, PXXXX 4-X, 
SXXXXXX, PXXXX
JXXXX
PXXXX : 9X-XXX-XXXXXXX Fax : 9X-XXX-XXXXXX
eXXXX : [email protected]"

Note: We need to split the original string into words based on a single space Then, in those words, we need to replace all letters (lower and upper case) and numbers except for the first character in each word with an "X"

I know its sort of difficult, but a seasoned RegEx expert could nail this pretty easily I would think. No?

Edit:

I've made some progress. Found a function (http://www.addedbytes.com/lab/vbscript-regular-expressions/) that sort of does the job. But needs a little refinement, if anyone can help

function ereg_replace(strOriginalString, strPattern, strReplacement, varIgnoreCase) 
' Function replaces pattern with replacement 
' varIgnoreCase must be TRUE (match is case insensitive) or FALSE (match is case sensitive) 
dim objRegExp : set objRegExp = new RegExp 
 with objRegExp 
    .Pattern = strPattern 
    .IgnoreCase = varIgnoreCase 
    .Global = True 
end with 
ereg_replace = objRegExp.replace(strOriginalString, strReplacement) 
set objRegExp = nothing 
end function

Im calling it like so -

orgstr = ereg_replace(orgstr, "\w", "X", True)

However, the result looks like -

XXXXX XXXXXXXX

XXXXXXXX XXXXXXXX XXX.
XX, XXXXX XXXX, XXXXXX XXXXXX, XXXXXXX XXXXXXX, XXXXXXXXX
XXXXX : XXX-XXX-XXXX
XXX :
XXXXX : [email protected]

I'd like this to show the first character in every word. Any help out there?

Upvotes: 1

Views: 944

Answers (4)

AnthonyWJones
AnthonyWJones

Reputation: 189505

This approach gets close:

Function AnonymiseWord(m, p, s)

   AnonymiseWord = Left(m, 1) & String(Len(m) - 1, "X")

End Function 


Function AnonymiseText(input)

    Dim rgx: Set rgx = new RegExp
    rgx.Global = True
    rgx.Pattern = "\b\w+?\b"

    AnonymiseText = rgx.Replace(input, GetRef("AnonymiseWord"))

End Function

This might get you close enough to what you need otherwise the basic approach is sound but you may need to fiddle with that pattern to get it match exactly the stretches of text you want to put through AnonymiseWord.

Upvotes: 2

AutomatedChaos
AutomatedChaos

Reputation: 7490

Although I love regular expressions, you could do it without them, especially because VBScript does not support look behind.

Dim mystring, myArray, newString, i, j
Const forbiddenChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
myString = "James John Junior   S.D. Industrial Corpn   D-2341, Focal Point, Phase 4-a,  Sarsona, Penns Japan Phone : 92-161-4633248 Fax : 92-161-253214 email : [email protected]"
myArray = split(myString, " ")

For i = lbound(myArray) to ubound(myArray)
    newString = left(myArray(i), 1)
    For j = 2 to len(myArray(i))
        If instr(forbiddenChars, mid(myArray(i), j, 1)) > 0 Then
            newString = newString & "X"
        else
            newString = newString & mid(myArray(i), j, 1)
        End If
    Next
    myArray(i) = newString
Next

myString = join(myArray, " ")

It doesn't cope with the VbNewLine character, but you will get the idea. You can do an extra split on the VbNewLine character, iterate through all elements and split each element on the space for example.

Upvotes: 0

stema
stema

Reputation: 93026

I have no idea about classic ASP, but if it does support (negative) lookbehinds and the only problem is the quantifier in the lookbehind, then why not turn it around and do it this way:

(?<!^)(?<!\s)[a-zA-Z0-9]

and replace with "X".

Means, replace every letter and number if there is not a whitespace or not the start of the string/row before.

See it here on Regexr

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336418

Well, in .NET it would be easy:

resultString = Regex.Replace(subjectString, 
    @"(?<=         # Assert that there is before the current position...
     \b            # a word boundary
     \w            # one alphanumeric character (= first letter/digit/underscore)
     [\w.@-]*      # any number of alnum characters or ., @ or -
    )              # End of lookbehind
    [\p{L}\p{N}]   # Match any letter or digit to be replaced", 
    "X", RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);

The result, though, would be slightly different than what you wrote:

"JXXXX JXXX JXXXXX 

S.X. IXXXXXXXXX CXXXX  
D-XXXX, FXXXX PXXXX, PXXXX 4-X, 
SXXXXXX, PXXXX
JXXXX
PXXXX : 9X-XXX-XXXXXXX FXX : 9X-XXX-XXXXXX
eXXXX : [email protected]"

(observe that Fax has also been changed to FXX)

Without .NET, you could try something like

orgstr = ereg_replace("\b(\w)[\w.@-]*", "\1XXXX", True); // not sure about the syntax here, you possibly need double backslashes

which would give you

"JXXXX JXXXX JXXXX 

SXXXX IXXXX CXXXX  
DXXXX, FXXXX PXXXX, PXXXX 4XXXX, 
SXXXX, PXXXX
JXXXX
PXXXX : 9XXXX FXXXX : 9XXXX
eXXXX : sXXXX"

You won't get it better than that with a single regex.

Upvotes: 1

Related Questions