leba-lev
leba-lev

Reputation: 2896

split a string based on pattern in java - capital letters and numbers

I have the following string "3/4Ton". I want to split it as -->

word[1] = 3/4 and word[2] = Ton.

Right now my piece of code looks like this:-

Pattern p = Pattern.compile("[A-Z]{1}[a-z]+");
Matcher m = p.matcher(line);
while(m.find()){
    System.out.println("The word --> "+m.group());
    }

It carries out the needed task of splitting the string based on capital letters like:-

String = MachineryInput

word[1] = Machinery , word[2] = Input

The only problem is it does not preserve, numbers or abbreviations or sequences of capital letters which are not meant to be separate words. Could some one help me out with my regular expression coding problem.

Thanks in advance...

Upvotes: 5

Views: 9007

Answers (2)

Sean Patrick Floyd
Sean Patrick Floyd

Reputation: 299048

You can actually do this in regex alone using look ahead and look behind (see special constructs on this page: http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html )

/**
 * We'll use this pattern as divider to split the string into an array.
 * Usage: myString.split(DIVIDER_PATTERN);
 */
private static final String DIVIDER_PATTERN =

        "(?<=[^\\p{Lu}])(?=\\p{Lu})"
                // either there is anything that is not an uppercase character
                // followed by an uppercase character

                + "|(?<=[\\p{Ll}])(?=\\d)"
        // or there is a lowercase character followed by a digit

        ;

@Test
public void testStringSplitting() {
    assertEquals(2, "3/4Word".split(DIVIDER_PATTERN).length);
    assertEquals(7, "ManyManyWordsInThisBigThing".split(DIVIDER_PATTERN).length);
    assertEquals(7, "This123/4Mixed567ThingIsDifficult"
                        .split(DIVIDER_PATTERN).length);
}

So what you can do is something like this:

for(String word: myString.split(DIVIDER_PATTERN)){
    System.out.println(word);
}

Sean

Upvotes: 5

corsiKa
corsiKa

Reputation: 82589

Using regex would be nice here. I bet there's a way to do it too, although I'm not a swing-in-on-a-vine regex guy so I can't help you. However, there's something you can't avoid - something, somewhere needs to loop over your String eventually. You could do this "on your own" like so:

String[] splitOnCapitals(String str) {
    ArrayList<String> array = new ArrayList<String>();
    StringBuilder builder = new StringBuilder();
    int min = 0;
    int max = 0;
    for(int i = 0; i < str.length(); i++) {
        if(Character.isUpperCase(str.charAt(i))) {
            String line = builder.toString().trim();
            if (line.length() > 0) array.add(line);
            builder = new StringBuilder();
        }
        builder.append(str.charAt(i));
    }
    array.add(builder.toString().trim()); // get the last little bit too
    return array.toArray(new String[0]);
}

I tested it with the following test driver:

public static void main(String[] args) {
    String test = "3/4 Ton truCk";
    String[] arr = splitOnCapitals(test);
    for(String s : arr) System.out.println(s);

    test = "Start with Capital";
    arr = splitOnCapitals(test);
    for(String s : arr) System.out.println(s);
}

And got the following output:

3/4
Ton tru
Ck
Start with
Capital

Upvotes: 2

Related Questions