stackerstack
stackerstack

Reputation: 265

Java streams modifying a list of strings to keep only the substring of each string

I am trying to modify a list of strings to keep only the substring of each of them. Here is what I'm trying to do:

List<String> paychecks = new ArrayList<>();
paychecks.add("Paycheck_Box_EMP_61299_451");
paychecks.add("Paycheck_Box_EMP_5512_221");
paychecks.add("Paycheck_Box_EMP_99993_881");
paychecks.add("Paycheck_Box_EMP_831_141");

paychecks.replaceAll(paycheck -> paycheck.subString("insert here"))

I've tried to write something where it says "insert here" but it throws me errors or only red lines appear, but basically I want to take the substring of the paycheck ID after EMP_ and before the next _ . So ideally it should be like this:

[61299, 5512, 99993, 831]

Update (second attempt):

paychecks.forEach(paycheck -> 
                      paycheck.replaceAll(paycheck, paycheck.substring(paycheck.indexOf("Paycheck_Box_"),
                         paycheck.indexOf("Paycheck_Box_" + "\\[(.*?)\\]" + "_")))))

Error thrown:

    java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 26   
    at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
    at java.base/java.lang.String.substring(String.java:1874)

Upvotes: 0

Views: 1112

Answers (5)

Grzegorz
Grzegorz

Reputation: 197

You was almost right in your second attempt, the easiest way to do this:

String prefix = "Paycheck_Box_EMP_"; // or use 17 instead of prefix.length()

paychecks.replaceAll(paycheck ->
        paycheck.replaceAll(paycheck, paycheck.substring(prefix.length(), paycheck.lastIndexOf('_'))));

Upvotes: 2

Jimmy
Jimmy

Reputation: 1051

Try in this way:

  • Step 1 : Eliminate the prefix of targeted sub-string e.i Paycheck_Box_EMP_ 61299_451 and then temp result sub-string : 61299_451

  • Step 2 : Eliminate the suffix of targeted sub-string e.i 61299 _451 and final result of sub-string will be 61299

    paychecks.replaceAll(x-> x .replaceFirst("^Paycheck_Box_EMP_", "") .replaceFirst("_.*$", ""));

Upvotes: 1

Arvind Kumar Avinash
Arvind Kumar Avinash

Reputation: 79035

You can use the regex, Paycheck_Box_EMP_(\d+).* and replace the string with group(1).

Demo:

import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;

public class Main {
    public static void main(String[] args) {
        List<String> paychecks = new ArrayList<>();
        paychecks.add("Paycheck_Box_EMP_61299_451");
        paychecks.add("Paycheck_Box_EMP_5512_221");
        paychecks.add("Paycheck_Box_EMP_99993_881");
        paychecks.add("Paycheck_Box_EMP_831_141");

        List<String> substrs = 
                paychecks.stream()
                        .map(s -> s.replaceAll("Paycheck_Box_EMP_(\\d+).*", "$1"))
                        .collect(Collectors.toList());

        System.out.println(substrs);
    }
}

Output:

[61299, 5512, 99993, 831]

Explanation of the regex at regex101:

enter image description here

Upvotes: 1

tquadrat
tquadrat

Reputation: 4034

If I understand the task correctly, you want to have the ##### from Paycheck_Box_EMP_#####_451.

So you do not want to replace something, what you want is to extract something, right?

This should work like this:

List<String> paychecks = new ArrayList<>();
paychecks.add( "Paycheck_Box_EMP_61299_451" );
paychecks.add( "Paycheck_Box_EMP_5512_221" );
paychecks.add( "Paycheck_Box_EMP_99993_881" );
paychecks.add( "Paycheck_Box_EMP_831_141" );

final var pattern = Pattern.compile( "Paycheck_Box_EMP_(\\d{3,5})_\\d{3}" );
paychecks = paychecks.stream()
  .map( paycheck -> pattern.matcher( paycheck ) )
  .filter( matcher -> matcher.find() )
  .map( matcher -> group( 1 ) )
  .collect( Collectors.toList() );

Or when you insist in using List.replaceAll():

List<String> paychecks = new ArrayList<>();
paychecks.add( "Paycheck_Box_EMP_61299_451" );
paychecks.add( "Paycheck_Box_EMP_5512_221" );
paychecks.add( "Paycheck_Box_EMP_99993_881" );
paychecks.add( "Paycheck_Box_EMP_831_141" );

final var pattern = Pattern.compile( "Paycheck_Box_EMP_(\\d{3,5})_\\d{3}" );
paychecks.replaceAll( paycheck -> 
{
  var matcher = pattern.matcher( paycheck );
  matcher.find();
  return matcher.group( 1 );
} );

Fixed the Java based on Alex Rudenko's comments.

Upvotes: 2

azurefrog
azurefrog

Reputation: 10945

Personally, I'm bad at writing, and even worse at reading regular expressions, so rather than trying to make the replacement efficient, I'd prioritize human readability.

Unless I'm looking at modifying a really large set of data, I'd do something like:

List<String> paychecks = new ArrayList<>();
paychecks.add("Paycheck_Box_EMP_61299_451");
paychecks.add("Paycheck_Box_EMP_5512_221");
paychecks.add("Paycheck_Box_EMP_99993_881");
paychecks.add("Paycheck_Box_EMP_831_141");
    

paychecks.replaceAll(person -> person
                                .replaceFirst("^Paycheck_Box_EMP_", "") // remove prefix
                                .replaceFirst("_.*$", ""));             // remove suffix

    
System.out.println(paychecks);      // [61299, 5512, 99993, 831]

You could further refine the prefix and suffix regexp, depending on how exactly you know what the format is going to be.

For instance, in your updated question, the prefix is always constant, so you could use a simple replace() call instead. Likewise, if you know the suffix is always numberic, you could use [0-9]* instead of .*.

Upvotes: 2

Related Questions