user678392
user678392

Reputation: 2031

Regular expression for finding two words in a string

Here is my basic problem: I am reading some lines in from a file. The format of each line in the file is this:

John Doe    123

There is a tab between Doe and 123.

I'm looking for a regex such that I can "pick off" the John Doe. Something like scanner.next(regular expression) that would give me the John Doe.

This is probably very simple, but I can't seem to get it to work. Also, I'm trying to figure this out without having to rely on the tab being there.

I've looked here: Regular Expression regex to validate input: Two words with a space between. But none of these answers worked. I kept getting runtime errors.

Some Code:

while(inFile.hasNextLine()){
    String s = inFile.nextLine();
    Scanner string = new Scanner(s);
    System.out.println(s); // check to make sure I got the string
    System.out.println(string.next("[A-Za-z]+ [A-Za-z]+")); //This  
                                                //doesn't work for me
    System.out.println(string.next("\\b[A-Za-z ]+\\b"));//Nor does
                                                               //this
 }

Upvotes: 2

Views: 1874

Answers (4)

Jasonw
Jasonw

Reputation: 5064

Do you prefer simplicity and readability? If so, consider the following solution

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class MyLineScanner
{

    public static void readLine(String source_file) throws FileNotFoundException
    {
        File source = new File(source_file);
        Scanner line_scanner = new Scanner(source);

        while(line_scanner.hasNextLine())
        {
            String line = line_scanner.nextLine();

            // check to make sure line is exists;
            System.out.println(line); 

            // this work for me             
            Scanner words_scanner = new Scanner(line);
            words_scanner.useDelimiter("\t");           

            while (words_scanner.hasNext())
            {
                System.out.format("word : %s %n", words_scanner.next());
            }
        }

    }



    public static void main(String[] args) throws FileNotFoundException
    {
        readLine("source.txt");

    }

}

Upvotes: 0

Bob Kuhar
Bob Kuhar

Reputation: 11100

This basically works to isolate John Doe from the rest...

public String isolateAndTrim( String candidate ) {
    // This pattern isolates "John Doe" as a group...
    Pattern pattern = Pattern.compile( "(\\w+\\s+\\w+)\\s+\\d*" );
    Matcher matcher = pattern.matcher( candidate );
    String clean = "";
    if ( matcher.matches() ) {
        clean = matcher.group( 1 );
        // This replace all reduces away extraneous whitespace...
        clean = clean.replaceAll( "\\s+", " " );
    }
    return clean;
}

The grouping parenthesis will allow you to "pick off" the name portion from the digit portion. "John Doe", "Jane Austin", whatever. You should learn the grouping stuff in RegEx as it works great for problems just like this one.

The trick to remove the extra whitespace comes from How to remove duplicate white spaces in string using Java?

Upvotes: 0

mathematical.coffee
mathematical.coffee

Reputation: 56905

It would help if you provided the code you're trying that is giving you runtime errors.

You could use regex:

[A-Za-z]+ [A-Za-z]+

if you always knew your name was going to be two words.

You could also try

\b[A-Za-z ]+\b

which matches any number of words (containing alphabets), making sure it captures whole words (that's what the '\b' is) --> to return "John Doe" instead of "John Doe " (with the trailing space too). Don't forget backslashes need to be escaped in Java.

Upvotes: 0

Tim
Tim

Reputation: 14446

Are you required to use regex for this? You could simply use a split method across \t on each line and just grab the first or second element (I'm not sure which you meant by 'pick off' john doe).

Upvotes: 1

Related Questions