Ilya Fedoseev
Ilya Fedoseev

Reputation: 287

Why is Kotlin String.split with a regex string not the same as Java?

I have the following Java code:

String str = "12+20*/2-4";
List<String> arr = new ArrayList<>();

arr = str.split("\\p{Punct}");

//expected: arr = {12,20,2,4}

I want the equivalent Kotlin code, but .split("\\p{Punct}") doesn't work. I don't understand the documentation here: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/split.html

Upvotes: 25

Views: 20129

Answers (4)

Ajith M A
Ajith M A

Reputation: 5348

From Kotlin 1.6 there is a better and easier way of doing this.

fun main() {
    val str = "12+20*/2-4"
    val regex = "\\p{Punct}".toRegex()
    var arr = str.splitToSequence(regex);
    arr.forEach(){
            println(it.toString())
    }
}

Reference: https://kotlinlang.org/docs/whatsnew16.html#splitting-regex-into-a-sequence

Upvotes: 0

River
River

Reputation: 9093

For regex behavior, your argument must be of type Regex, not merely a String containing special regex characters.

Most string manipulation methods in Kotlin (replace, split, etc.) can take both String and Regex arguments, but you must convert your String to Regex if you want regex-specific matching.

This conversion can be done using String.toRegex() or Regex(String):

val str = "12+20*/2-4";
str.split("\\p{Punct}".toRegex()) //this
str.split(Regex("\\p{Punct}")) //or this

Currently split is treating that first backslash as an escape character instead of recognizing it as a special regex sequence.


as mentioned by @holi-java in their answer this will match an empty string between * and / giving ["12","20","","2","4"]. You can use "\\p{Punct}+" as your regex to avoid this. (Though note that Java gives the output with this empty string unless a + is included there as well.)

Upvotes: 7

holi-java
holi-java

Reputation: 30676

you should using String#split(Regex) instead, for example:

val str = "12+20*/2-4";
val arr = str.split("\\p{Punct}".toRegex());
//  ^--- but the result is ["12","20","","2","4"]

val arr2 = arr.filter{ !it.isBlank() };
//  ^--- you can filter it as further, and result is: ["12","20","2","4"]

OR you can split more Punctuations by using \\p{Punct}+ , for example:

val arr = str.split("\\p{Punct}+".toRegex())
//  ^--- result is: ["12","20","2","4"]

OR invert the regex and using Regex#findAll instead, and you can find out the negative numbers in this way. for example:

val str ="12+20*/2+(-4)";

val arr ="(?<!\\d)-?[^\\p{Punct}]+".toRegex().findAll(str).map{ it.value }.toList()
//  ^--- result is ["12","20","2","-4"]
//   negative number is found   ---^

Upvotes: 34

tango24
tango24

Reputation: 474

You can call

str.split(Regex("{\\p{Punct}"))

Upvotes: 3

Related Questions