Reputation: 2031
Here is my basic problem: I am reading some lines in from a file. The format of each line in the file is this:
John Doe 123
There is a tab between Doe
and 123
.
I'm looking for a regex such that I can "pick off" the John Doe
. Something like scanner.next(regular expression)
that would give me the John Doe
.
This is probably very simple, but I can't seem to get it to work. Also, I'm trying to figure this out without having to rely on the tab being there.
I've looked here: Regular Expression regex to validate input: Two words with a space between. But none of these answers worked. I kept getting runtime errors.
Some Code:
while(inFile.hasNextLine()){
String s = inFile.nextLine();
Scanner string = new Scanner(s);
System.out.println(s); // check to make sure I got the string
System.out.println(string.next("[A-Za-z]+ [A-Za-z]+")); //This
//doesn't work for me
System.out.println(string.next("\\b[A-Za-z ]+\\b"));//Nor does
//this
}
Upvotes: 2
Views: 1874
Reputation: 5064
Do you prefer simplicity and readability? If so, consider the following solution
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class MyLineScanner
{
public static void readLine(String source_file) throws FileNotFoundException
{
File source = new File(source_file);
Scanner line_scanner = new Scanner(source);
while(line_scanner.hasNextLine())
{
String line = line_scanner.nextLine();
// check to make sure line is exists;
System.out.println(line);
// this work for me
Scanner words_scanner = new Scanner(line);
words_scanner.useDelimiter("\t");
while (words_scanner.hasNext())
{
System.out.format("word : %s %n", words_scanner.next());
}
}
}
public static void main(String[] args) throws FileNotFoundException
{
readLine("source.txt");
}
}
Upvotes: 0
Reputation: 11100
This basically works to isolate John Doe from the rest...
public String isolateAndTrim( String candidate ) {
// This pattern isolates "John Doe" as a group...
Pattern pattern = Pattern.compile( "(\\w+\\s+\\w+)\\s+\\d*" );
Matcher matcher = pattern.matcher( candidate );
String clean = "";
if ( matcher.matches() ) {
clean = matcher.group( 1 );
// This replace all reduces away extraneous whitespace...
clean = clean.replaceAll( "\\s+", " " );
}
return clean;
}
The grouping parenthesis will allow you to "pick off" the name portion from the digit portion. "John Doe", "Jane Austin", whatever. You should learn the grouping stuff in RegEx as it works great for problems just like this one.
The trick to remove the extra whitespace comes from How to remove duplicate white spaces in string using Java?
Upvotes: 0
Reputation: 56905
It would help if you provided the code you're trying that is giving you runtime errors.
You could use regex:
[A-Za-z]+ [A-Za-z]+
if you always knew your name was going to be two words.
You could also try
\b[A-Za-z ]+\b
which matches any number of words (containing alphabets), making sure it captures whole words (that's what the '\b' is) --> to return "John Doe" instead of "John Doe " (with the trailing space too). Don't forget backslashes need to be escaped in Java.
Upvotes: 0
Reputation: 14446
Are you required to use regex for this? You could simply use a split
method across \t
on each line and just grab the first or second element (I'm not sure which you meant by 'pick off' john doe).
Upvotes: 1