mpak
mpak

Reputation: 2518

Split string after n amount of digits occurrence

I'm parsing some folder names here. I have a program that lists subfolders of a folder and parses folder names.

For example, one folder could be named something like this:

"Folder.Name.1234.Some.Info.Here-ToBeParsed"

and I would like to parse it so name would be "Folder Name". At the moment I'm first using string.replaceAll() to get rid of special characters and then there is this 4-digit sequence. I would like to split string on that point. How can I achieve this?

Currently my code looks something like this:

// Parsing string if regex p matches folder's name
if(b) {
    //System.out.println("Folder: \" " + name + "\" contains special characters.");
    String result = name.replaceAll("[\\p{P}\\p{S}]", " "); // Getting rid of all punctuations and symbols.
    //System.out.println("Parsed: " + name + " > " + result);

    // If string matches regex p2
    if(b2) {
        //System.out.println("Folder: \" " + result + "\" contains release year.");
        String parsed_name[] = result.split("20"); // This is the line i would like to split when 4-digits in row occur.
        //System.out.println("Parsed: " + result + " > " + parsed_name[0]);
        movieNames.add(parsed_name[0]);
    }

Or maybe there is even easier way to do this? Thanks in advance!

Upvotes: 2

Views: 216

Answers (1)

anubhava
anubhava

Reputation: 784998

You should keep it simple like this:

String name = "Folder.Name.1234.Some.Info.Here-ToBeParsed";
String repl = name.replaceFirst( "\\.\\d{4}.*", "" ).
         replaceAll( "[\\p{P}\\p{S}&&[^']]+", " " );
//=> Folder Name
  • replaceFirst is removing everything after a DOT and 4 digits
  • replaceAll is replacing all punctuation and space (except apostrophe) by a single space

Upvotes: 1

Related Questions