user9174501
user9174501

Reputation:

How to use regex to split string into groups of identical characters?

I got a string like this:

var string = "AAAAAAABBBCCCCCCDD"

and like to split the string into an array of this format (same characters --> same group) using regular expressions:

Array: "AAAAAAA", "BBB", "CCCCCC", "DD"

This Is what I got so far but tbh I can not really get it working.


var array = [String]()
var string = "AAAAAAABBBCCCCCCDD"
let pattern = "\\ b([1,][a-z])\\" // mistake?!
let regex = try! NSRegularExpression(pattern: pattern, options: [])

array = regex.matchesInString(string, options: [], range: NSRange(location: 0, length: string.count))

Upvotes: 3

Views: 2612

Answers (2)

Martin R
Martin R

Reputation: 539685

You can achieve that with a "back reference", compare NSRegularExpression:

\n

Back Reference. Match whatever the nth capturing group matched. n must be a number ≥ 1 and ≤ total number of capture groups in the pattern.

Example (using the utility method from Swift extract regex matches):

let string = "AAAAAAABBBCCCCCCDDE"
let pattern = "(.)\\1*"

let array = matches(for: pattern, in: string)
print(array)
// ["AAAAAAA", "BBB", "CCCCCC", "DD", "E"]

The pattern matches an arbitrary character, followed by zero or more occurrences of the same character. If you are only interested in repeating word characters use

let pattern = "(\\w)\\1*"

instead.

Upvotes: 1

Mo Abdul-Hameed
Mo Abdul-Hameed

Reputation: 6110

You can achieve that using this function from this answer:

func matches(for regex: String, in text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex)
        let results = regex.matches(in: text,
                                    range: NSRange(text.startIndex..., in: text))
        return results.map {
            String(text[Range($0.range, in: text)!])
        }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Passing (.)\\1+ as regex and AAAAAAABBBCCCCCCDD as text like this:

let result = matches(for: "(.)\\1+", in: "AAAAAAABBBCCCCCCDD")
print(result) // ["AAAAAAA", "BBB", "CCCCCC", "DD"]

Upvotes: 1

Related Questions