sergionni
sergionni

Reputation: 13510

Multiline regexp matcher

There is input file with content:
XX00002200000
XX00003300000

regexp:

(.{6}22.{5}\W)(.{6}33.{5})

Tried in The Regex Coach(app for regexp testing), strings are matched OK.

Java:

        pattern = Pattern.compile(patternString);
        inputStream = resource.getInputStream();

        scanner = new Scanner(inputStream, charsetName);
        scanner.useDelimiter("\r\n");

patternString is regexp(mentioned above) added as bean property from .xml

It's failed from Java.

Upvotes: 2

Views: 1716

Answers (3)

Gaurav Saxena
Gaurav Saxena

Reputation: 4297

Pardon my ignorance, but I am still not sure what exactly are you trying to search. In case, you are trying to search for the string (with new lines)

XX00002200000
XX00003300000

then why are you reading it by delimiting it by new lines?

To read the above string as it is, the following code works

Pattern p = Pattern.compile(".{6}22.{5}\\W+.{6}33.{5}");

 FileInputStream scanner = null;
        try {
            scanner = new FileInputStream("C:\\new.txt");
            {
                byte[] f = new byte[100];
                scanner.read(f);
                String s = new String(f);
                Matcher m = p.matcher(s);
                if(m.find())
                    System.out.println(m.group());
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

NB: here new.txt file contains the string

XX00002200000
XX00003300000

Upvotes: 0

Margus
Margus

Reputation: 20038

Simple solution: ".{6}22.{5}\\s+.{6}33.{5}". Note that \s+ is a shorthand for consequent whitespace elements.

Heres an example:

 public static void main(String[] argv) throws FileNotFoundException {
  String input = "yXX00002200000\r\nXX00003300000\nshort", regex = ".{6}22.{5}\\s+.{6}33.{5}", result = "";
  Pattern pattern = Pattern.compile(regex);
  Matcher m = pattern.matcher(input);

  while (m.find()) {
   result = m.group();
   System.out.println(result);
  }
 }

With output:

XX00002200000
XX00003300000

To play around with Java Regex you can use: Regular Expression Editor (free online editor)

Edit: I think that you are changing the input when you are reading data, try:

public static String readFile(String filename) throws FileNotFoundException {
    Scanner sc = new Scanner(new File(filename));

    StringBuilder sb = new StringBuilder();
    while (sc.hasNextLine())
        sb.append(sc.nextLine());
    sc.close();

    return sb.toString();
}

Or

static String readFile(String path) {
    FileInputStream stream = null;
    FileChannel channel = null;
    MappedByteBuffer buffer = null;

    try {
        stream = new FileInputStream(new File(path));
        channel = stream.getChannel();
        buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0,
                channel.size());
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        try {
            stream.close();
        } catch (Exception e2) {
            e2.printStackTrace();
        }
    }

    return Charset.defaultCharset().decode(buffer).toString();
}

With imports like:

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

Upvotes: 2

Emil
Emil

Reputation: 13789

Try this change in delimiter:

 scanner.useDelimiter("\\s+");

also why don't you use a more general regex expression like this :

 ".{6}[0-9]{2}.{5}"

The regex you have mentioned above is for 2 lines.Since you have mentioned the delimiter as a new line you should be giving a regex expression suitable for a single line.

Upvotes: 0

Related Questions