Kevin
Kevin

Reputation: 6292

Regular expression for extracting information

I have a csv file with the following data format

123,"12.5","0.6","15/9/2012 12:11:19"

These numbers are:

I want to extract these data from the line.

I have tried the regular expression:

String line = "123,\"12.5\",\"0.6\",\"15/9/2012 12:11:19\"";
Pattern pattern = Pattern.compile("(\\W?),\"([\\d\\.\\-]?)\",\"([\\d\\.\\-]?)\",\"([\\W\\-\\:]?)\"");
Scanner scanner = new Scanner(line);
if(scanner.hasNext(pattern)) {
    ...
}else{
    // Alaways goes to here
}

It looks like my pattern is not correct as it always goes to the else section. What did I do wrong? Can someone suggests a solution for this?

Many thanks.

Upvotes: 0

Views: 79

Answers (5)

felixgaal
felixgaal

Reputation: 2423

This is a possible solution to your situation:

    String line = "123,\"12.5\",\"0.6\",\"15/9/2012 12:11:19\"";
    Pattern pattern = Pattern.compile("([0-9]+),\\\"([0-9.]+)\\\",\\\"([0-9.]+)\\\",\\\"([0-9/:\\s]+)\\\"");
    Scanner scanner = new Scanner(line);
    scanner.useDelimiter("\n");
    if(scanner.hasNext(pattern)) {
        MatchResult result = scanner.match();
        System.out.println("1st: " + result.group(1));
        System.out.println("2nd: " + result.group(2));
        System.out.println("3rd: " + result.group(3));
        System.out.println("4th: " + result.group(4));
    }else{
        System.out.println("There");
    }

Note that ? means 0 or 1 occurrences, meanwhile + means 1 or more.

Observe the use of 0-9 for digits. You can also use \dif you like. For spaces, you must change the delimiter of the scanner with scanner.useDelimiter("\n"), for example.

The output of this snippet is:

1st: 123
2nd: 12.5
3rd: 0.6
4th: 15/9/2012 12:11:19

Upvotes: 0

Reimeus
Reimeus

Reputation: 159754

Regular expressions are very cumbersome for this type of work.

I suggest using a CSV library such as OpenCSV instead.

The library can parse the String entries into a String array and individual entries can be parsed as required. Here an OpenCSV example for the specific problem:

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
   int orderNumber = Integer.parseInt(nextLine[0]);
   double price = Double.parseDouble(nextLine[1]);
   double discountRate = Double.parseDouble(nextLine[2]);
   ...
}

Full documentation and examples can be found here

Upvotes: 1

Pshemo
Pshemo

Reputation: 124215

scanner.hasNext(pattern)

from documentation

Returns true if the next complete token matches the specified pattern.

but next token is 123,"12.5","0.6","15/9/2012 because scanner tokenizes words using space.

Also there are few problems with your regex

  • you used ? which means zero or one where you should use * - zero or more, or + - one or more,
  • you used \\W at start but this will also exclude numbers.

If you really want to use scanner and regex then try with

Pattern.compile("(\\d+),\"([^\"]+)\",\"([^\"]+)\",\"([^\"]+)\"");

and change used delimiter to new line mark with

scanner.useDelimiter(System.lineSeparator());

Upvotes: 0

Oleg Pyzhcov
Oleg Pyzhcov

Reputation: 7353

? in regex means "zero or one occurrence". You probably wanted to use + instead (one or more) so it could capture all the digits, points, colons, etc.

Upvotes: 0

Josh M
Josh M

Reputation: 11939

Seems a bit overcomplicated to specifically split, you should try splitting by the most obvious common delimiter between the elements, which is a comma. Perhaps you should try something like this:

    final String info = "123,\"12.5\",\"0.6\",\"15/9/2012 12:11:19\"";
    final String[] split = info.split(",");
    final int orderNumber = Integer.parseInt(split[0]);
    final double price = Double.parseDouble(split[1].replace("\"", ""));
    final double discountRate = Double.parseDouble(split[2].replace("\"", ""));
    final String date = split[3].replace("\"", "");

Upvotes: 1

Related Questions