Reputation: 265
I am trying to modify a list of strings to keep only the substring of each of them. Here is what I'm trying to do:
List<String> paychecks = new ArrayList<>();
paychecks.add("Paycheck_Box_EMP_61299_451");
paychecks.add("Paycheck_Box_EMP_5512_221");
paychecks.add("Paycheck_Box_EMP_99993_881");
paychecks.add("Paycheck_Box_EMP_831_141");
paychecks.replaceAll(paycheck -> paycheck.subString("insert here"))
I've tried to write something where it says "insert here" but it throws me errors or only red lines appear, but basically I want to take the substring of the paycheck ID after EMP_ and before the next _ . So ideally it should be like this:
[61299, 5512, 99993, 831]
Update (second attempt):
paychecks.forEach(paycheck ->
paycheck.replaceAll(paycheck, paycheck.substring(paycheck.indexOf("Paycheck_Box_"),
paycheck.indexOf("Paycheck_Box_" + "\\[(.*?)\\]" + "_")))))
Error thrown:
java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 26
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
at java.base/java.lang.String.substring(String.java:1874)
Upvotes: 0
Views: 1112
Reputation: 197
You was almost right in your second attempt, the easiest way to do this:
String prefix = "Paycheck_Box_EMP_"; // or use 17 instead of prefix.length()
paychecks.replaceAll(paycheck ->
paycheck.replaceAll(paycheck, paycheck.substring(prefix.length(), paycheck.lastIndexOf('_'))));
Upvotes: 2
Reputation: 1051
Try in this way:
Step 1 : Eliminate the prefix of targeted sub-string e.i Paycheck_Box_EMP_ 61299_451 and then temp result sub-string : 61299_451
Step 2 : Eliminate the suffix of targeted sub-string e.i 61299 _451 and final result of sub-string will be 61299
paychecks.replaceAll(x-> x .replaceFirst("^Paycheck_Box_EMP_", "") .replaceFirst("_.*$", ""));
Upvotes: 1
Reputation: 79035
You can use the regex, Paycheck_Box_EMP_(\d+).*
and replace the string with group(1).
Demo:
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
List<String> paychecks = new ArrayList<>();
paychecks.add("Paycheck_Box_EMP_61299_451");
paychecks.add("Paycheck_Box_EMP_5512_221");
paychecks.add("Paycheck_Box_EMP_99993_881");
paychecks.add("Paycheck_Box_EMP_831_141");
List<String> substrs =
paychecks.stream()
.map(s -> s.replaceAll("Paycheck_Box_EMP_(\\d+).*", "$1"))
.collect(Collectors.toList());
System.out.println(substrs);
}
}
Output:
[61299, 5512, 99993, 831]
Explanation of the regex at regex101:
Upvotes: 1
Reputation: 4034
If I understand the task correctly, you want to have the #####
from Paycheck_Box_EMP_#####_451
.
So you do not want to replace something, what you want is to extract something, right?
This should work like this:
List<String> paychecks = new ArrayList<>();
paychecks.add( "Paycheck_Box_EMP_61299_451" );
paychecks.add( "Paycheck_Box_EMP_5512_221" );
paychecks.add( "Paycheck_Box_EMP_99993_881" );
paychecks.add( "Paycheck_Box_EMP_831_141" );
final var pattern = Pattern.compile( "Paycheck_Box_EMP_(\\d{3,5})_\\d{3}" );
paychecks = paychecks.stream()
.map( paycheck -> pattern.matcher( paycheck ) )
.filter( matcher -> matcher.find() )
.map( matcher -> group( 1 ) )
.collect( Collectors.toList() );
Or when you insist in using List.replaceAll()
:
List<String> paychecks = new ArrayList<>();
paychecks.add( "Paycheck_Box_EMP_61299_451" );
paychecks.add( "Paycheck_Box_EMP_5512_221" );
paychecks.add( "Paycheck_Box_EMP_99993_881" );
paychecks.add( "Paycheck_Box_EMP_831_141" );
final var pattern = Pattern.compile( "Paycheck_Box_EMP_(\\d{3,5})_\\d{3}" );
paychecks.replaceAll( paycheck ->
{
var matcher = pattern.matcher( paycheck );
matcher.find();
return matcher.group( 1 );
} );
Fixed the Java based on Alex Rudenko's comments.
Upvotes: 2
Reputation: 10945
Personally, I'm bad at writing, and even worse at reading regular expressions, so rather than trying to make the replacement efficient, I'd prioritize human readability.
Unless I'm looking at modifying a really large set of data, I'd do something like:
List<String> paychecks = new ArrayList<>();
paychecks.add("Paycheck_Box_EMP_61299_451");
paychecks.add("Paycheck_Box_EMP_5512_221");
paychecks.add("Paycheck_Box_EMP_99993_881");
paychecks.add("Paycheck_Box_EMP_831_141");
paychecks.replaceAll(person -> person
.replaceFirst("^Paycheck_Box_EMP_", "") // remove prefix
.replaceFirst("_.*$", "")); // remove suffix
System.out.println(paychecks); // [61299, 5512, 99993, 831]
You could further refine the prefix and suffix regexp, depending on how exactly you know what the format is going to be.
For instance, in your updated question, the prefix is always constant, so you could use a simple replace()
call instead. Likewise, if you know the suffix is always numberic, you could use [0-9]*
instead of .*
.
Upvotes: 2