chaituse
chaituse

Reputation: 21

Regex to identify a word containing spaces

I Need to identify a string in a text and replace it with null string. Problem is, it is not always present as a word itself. There will be space character present between each letter or set of letters. For example:

For word "Decent", I may face the following values. D ec ent, De ce nt, De ce n t .

Is there a way to identify these strings using "Decent" word as input with any regular expression? I am very new to regular expressions. Please help!!

TIA!

Upvotes: 0

Views: 1380

Answers (4)

Keith Nicholas
Keith Nicholas

Reputation: 44288

\bD\s*e\s*c\s*e\s*n\s*t\s*

so you match D ec ent, De ce nt, De ce n t, decent Decent

but not blade centimeter

Upvotes: 1

rmalchow
rmalchow

Reputation: 2769

first a bit of code:

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class WordsWithSpaces {

    public static void main(String[] args) {
        String test = "Descent D escent De s  cent desce nd";
        String word = "descent";
        String pattern = "";
        for(int i=0; i<word.length();i++) {
            pattern = pattern+word.charAt(i)+"\\s*";
        }
        System.err.println("pattern is: "+pattern);
        Pattern p = Pattern.compile(pattern,Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(test);
        while(m.find()) {
            String found = test.substring(m.start(),m.end());
            System.err.println(found+" matches");
        }


    }

}

now for the explanation: \s is a character class for whitespace. this includes spaces and tabs and (possibly) linebreaks. in this piece of code, i take every character of the word i am looking for, and append "\s", with "*" meaning 0 or mor occurences.

also, to avoid it being case sensitive, i set the CASE_INSENSITIVE flag on the pattern.

character classes may not have the same name in your programming language of choice, but there should be one for whitespace. check your documentation.

Upvotes: 0

Jim Mischel
Jim Mischel

Reputation: 133975

The expression "D\s*e\s*c\s*e\s*n\s*t" will do it. Each letter is followed by zero or more spaces. Actually \s is "whitespace characters." You could replace \s* with * (space followed by an asterisk) if you just want literal spaces.

Upvotes: 1

Diego Torres Milano
Diego Torres Milano

Reputation: 69198

If you use

'D ?e ?c ?e ?n ?t ?'

it will match the word with extra spaces

Upvotes: 1

Related Questions