Javakid
Javakid

Reputation: 357

How to get substring with pattern using Java

I have a file containing records as below:

drwxr-xr-x   - root supergroup          0 2015-04-05 05:26 /user/root
drwxr-xr-x   - hadoop supergroup          0 2014-11-05 11:56 /user/root/input
drwxr-xr-x   - hadoop supergroup          0 2014-11-05 03:06 /user/root/input/foo
drwxr-xr-x   - hadoop supergroup          0 2015-04-28 03:06 /user/root/input/foo/bar
drwxr-xr-x   - hadoop supergroup          0 2013-11-06 15:54 /user/root/input/foo/bar/20120706
-rw-r--r--   3 hadoop supergroup          0 2013-11-06 15:54 /user/root/input/foo/bar/20120706/_SUCCESS
drwxr-xr-x   - hadoop supergroup          0 2013-11-06 15:54 /user/root/input/foo/bar/20120706/_logs
drwxr-xr-x   - hadoop supergroup          0 2013-11-06 15:54 /user/root/input/foo/bar/20120706/_logs/history

In the Java code, I use Pattern and Matcher class to get substrings that I want to process later. The code is as in the listing:

String filename = "D:\\temp\\files_in_hadoop_temp.txt";
Pattern thePattern
    = Pattern.compile("[a-z\\-]+\\s+(\\-|[0-9]) (root|hadoop)\\s+supergroup\\s+([0-9]+) ([0-9\\-]+) ([0-9:]+) (\\D+)\\/?.*");

    try
    {
        Files.lines(Paths.get(filename))
                .map(line -> thePattern.matcher(line))
                .collect(Collectors.toList())
                .forEach(theMather -> {
                    if (theMather.find())
                    {
                        System.out.println(theMather.group(3) + "-" + theMather.group(4) + "-" + theMather.group(6));
                    }
                });
    } catch (IOException e)
    {
        e.printStackTrace();
    }

and the result is as below:

0-2015-04-05-/user/root
0-2014-11-05-/user/root/input
0-2014-11-05-/user/root/input/foo
0-2015-04-28-/user/root/input/foo/bar
0-2013-11-06-/user/root/input/foo/bar/
0-2013-11-06-/user/root/input/foo/bar/
0-2013-11-06-/user/root/input/foo/bar/
0-2013-11-06-/user/root/input/foo/bar/

But my expected results are without the tailing "/" as the first three rows. I have tried many patterns to strip the tailing "/" but failed.

Would you please provide some suggestions about the pattern to strip the tailing "/".

Thank you a lot.

Upvotes: 0

Views: 165

Answers (2)

ajb
ajb

Reputation: 31699

Use a character set to make sure the last character isn't a slash. Thus, instead of

(\\D+)\\/?.*"

try

(\\D*[^\\d/]).*

The part in parentheses matches the longest substring of nondigits, with the added restriction that the last character may not be a slash.

Note: Tested.

Upvotes: 1

Rod_Algonquin
Rod_Algonquin

Reputation: 26198

What you can do is to check a simple if statement if the last char is a slash and get the new string using substring:

if (theMather.find())
   {
       String data = theMather.group(3) + "-" + theMather.group(4) + "-" + theMather.group(6);
       //String data = theMather.group(3) + "-" + theMather.group(4) + "-" + theMather.group(6);
       if(data.charAt(data.length() - 1) == '/')
        data = data.substring(0, data.length() - 1);

       System.out.println(data);
   }

Upvotes: 0

Related Questions