Maggie
Maggie

Reputation: 8071

Java use regex to extract file name

I need to get a file name from file's absolute path (I am aware of the file.getName() method, but I cannot use it here). EDIT: I cannot use file.getName() because I don't need the file name only; I need the part of the file's path as well (but again, not the entire absoulte path). I need the part of file's path AFTER certain path provided.

Let's say the file is located in the folder:

C:\Users\someUser

On windows machine, if I make a pattern string as follows:

String patternStr = "C:\\Users\\someUser\\(.*+)";

I get an exception: java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence for backslash.

If I use Pattern.quote(File.pathSeparator):

String patternStr = "C:" + Pattern.quote(File.separator) + "Users" + Pattern.quote(File.separator) + "someUser" + Pattern.quote(File.separator) + "(.*+)";

the resulting pattern string is: C:\Q;\EUsers\Q;\EsomeUser\Q;\E(.*+) which of course has no match with the actual fileName "C:\Users\someUser\myFile.txt".

What am I missing here? What is the proper way to parse file name?

Upvotes: 0

Views: 9447

Answers (10)

Nischal
Nischal

Reputation: 1

Suppose the file name has special characters, specially when supporting MAC where special characters are allowing in filenames, server side Path.GetFileName(fileName) fails and throws error because of illegal characters in path. The following code using regex come for the rescue.

The following regex take care of 2 things

  1. In IE, when file is uploaded, the file path contains folders aswell (i.e. c:\samplefolder\subfolder\sample.xls). Expression below will replace all folders with empty string and retain the file name

  2. When used in Mac, filename is the only thing supplied as its safari browser and allows special chars in file name

     var regExpDir = @"(^[\w]:\\)([\w].+\w\\)";
    
     var fileName = Regex.Replace(fileName, regExpDir, string.Empty);
    

Upvotes: 0

aioobe
aioobe

Reputation: 420921

If you indeed want to use a regular expressions, you should use

String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";
                       ^^       ^^          ^^

instead.

Why? Your string literal

"C:\\Users\\someUser\\(.*+)"

is compiled to

C:\Users\someUser\(.*+)

Since \ is used for escaping in regular expressions too, you'll have to escape them "twice".


Regarding your edit:

You probably want to have a look at URI.relativize(). Example:

File base = new File("C:/Users/someUser");
File file = new File("C:/Users/someUser/someDir/someFile.txt");

String relativePath = base.toURI().relativize(file.toURI()).getPath();

System.out.println(relativePath); // prints "someDir/someFile.txt"

(Note that / works as file-separator on Windows machines too.)


Btw, I don't know what you have as File.separator on your system, but if it's set to \, then

"C:" + Pattern.quote(File.separator) + "Users" + Pattern.quote(File.separator) +
    "someUser" + Pattern.quote(File.separator) + "(.*+)";

should yield

C:\Q\\EUsers\Q\\EsomeUser\Q\\E(.*+)

Upvotes: 3

Stephen C
Stephen C

Reputation: 718698

I cannot use file.getName() because I don't need the file name only; I need the part of the file's path as well (but again, not the entire absoulte path).

OK. So what you want is something like this.

    // Canonicalize paths to deal with ".", "..", symlinks, 
    // relative files and case sensitivity issues.
    String directory = new File(someDirectory).canonicalPath();
    String test = new File(somePathname).canonicalPath();

    if (!directory.endsWith(File.separator)) {
        directory += File.separator;
    }
    if (test.startsWith(directory)) {
        String pathInDirectory = test.substring(directory.length()):
        ...
    }

Advantages:

  • No regexes needed.
  • Doesn't break if the path separator is something other than \.
  • Doesn't break if there are symbolic links on the path.
  • Doesn't break due to case sensitivity issues.

Upvotes: 0

user207421
user207421

Reputation: 310850

What am I missing here? What is the proper way to parse file name?

The proper way to parse a file name is to use the APIs that are already provided for the purpose. You've stated that you can't use File.getName(), without explanation. You are almost certainly mistaken about that.

Upvotes: 0

FailedDev
FailedDev

Reputation: 26920

Try this :

String ResultString = null;
try {
    Pattern regex = Pattern.compile("([^\\\\/:*?\"<>|\r\n]+$)");
    Matcher regexMatcher = regex.matcher(subjectString);
    if (regexMatcher.find()) {
        ResultString = regexMatcher.group(1);
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

Output :

myFile.txt

Also for input : C:/Users/someUser/myFile.txt

Output : myFile.txt

Upvotes: 0

Move from end of string to first occurrence of file path separator* or begin.

File paths separator can be / or \.

public static final char ALTERNATIVE_DIRECTORY_SEPARATOR_CHAR = '/';
public static final char DIRECTORY_SEPARATOR_CHAR = '\\';
public static final char VOLUME_SEPARATOR_CHAR = ':';


    public static String getFileName(String path) {

        if(path == null || path.isEmpty()) {
            return path;
        }

        int length = path.length();
        int index = length;

        while(--index >= 0) {

            char c = path.charAt(index);

            if(c == ALTERNATIVE_DIRECTORY_SEPARATOR_CHAR || c == DIRECTORY_SEPARATOR_CHAR || c == VOLUME_SEPARATOR_CHAR) {
                return path.substring(index + 1, length); 
            }
        }

        return path;
    }

Try to keep it simple ;-).

Upvotes: 0

Stephen C
Stephen C

Reputation: 718698

What is the proper way to parse file name?

The proper way to parse a file name is to use File(String). Using a regex for this is going to hard-wire platform dependencies into your code. That's a bad idea.

I know you said you can't use File.getName() ... but that is the proper solution. If you would care to say why you can't use File.getName() perhaps I could suggest an alternative solution.

Upvotes: 7

Aleks G
Aleks G

Reputation: 57306

Try putting double-double-backslashes in your pattern. You need a second backslash to escape one in the patter, plus you'll need to double each one to escape them in the string. Hence you'll end up with something like:

String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";

Upvotes: 0

Vivien Barousse
Vivien Barousse

Reputation: 20875

String patternStr = "C:\\Users\\someUser\\(.*+)";

Backslashes (\) are escape characters in the Java Language. Your string contains the following after compilation:

C:\Users\someUser\(.*+)

This string is then parsed as a regex, which uses backslashes as an escape character as well. The regex parser tries to understand the escaped \U, \s and \(. One of them is incorrect regarding the regex syntax (hence your exception), and none of them are what you are trying to achieve.

Try

String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";

Upvotes: 2

fyr
fyr

Reputation: 20859

If you want to solve it by pattern you need to escape your Pattern properly

String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";

Upvotes: 1

Related Questions