Reputation: 478
I have several strings that contain one or more digits and may also contain one or more letters following the digits (caps on letters don't matter). The strings follow the following regex pattern:
[0-9]+[a-zA-z]*
and may look like:
"15791"
"14810A"
"10480ABCD"
"5ABCDEFGH"
If one of the strings above contains non-numerical characters, how do I split the numbers (first part) into an integer and the letters (second part) into a string?
I know I can split a string like this:
array = "1,2,3,4".split(',')
But this doesn't help since I don't have a separator.
Upvotes: 3
Views: 5045
Reputation: 110675
[Edit: for some reason @Humza deleted his answer, so I've undeleted mine. I had previously posted this, but then deleted it when I noticed that Humza had already posted a similar answer.]
I feel like I must be missing something, as it seems to have a straightforward solution:
def extract(str)
str.scan(/\d+|[A-Z]+/i)
end
extract "15791" #=> ["15791"]
extract "14810A" #=> ["14810", "A"]
extract "10480ABCD" #=> ["10480", "ABCD"]
extract "5ABCDEFGH" #=> ["5", "ABCDEFGH"]
Upvotes: 0
Reputation: 168101
The splitter is the non-numerical characters themselves:
"10480ABCD".split(/(\D+)/)
# => ["10480", "ABCD"]
Upvotes: 9
Reputation: 114178
You can always use match
:
re = /(\d+)([a-z]*)/i
str = "10480ABCD"
m = re.match(str)
m #=> #<MatchData "10480ABCD" 1:"10480" 2:"ABCD">
m[0] #=> "10480"
m[1] #=> "ABCD"
Use MatchData#[]
to extract capture groups:
re.match(str)[1, 2]
["10480", "ABCD"]
Upvotes: 0
Reputation: 174706
Use a positive lookbehind assertion based regex in string.split
.
> "10480ABCD".split(/(?<=\d)(?=[A-Za-z])/)
=> ["10480", "ABCD"]
(?<=\d)
Positive lookbehind which asserts that the match must be preceded by a digit character.
(?=[A-Za-z])
which asserts that the match must be followed by an alphabet. So the above regex would match the boundary which exists between a digit and an alphabet. Splitting your input according to the matched boundary will give you the desired output.
OR
Use string.scan
> "10480ABCD".scan(/\d+|[A-Za-z]+/)
=> ["10480", "ABCD"]
Upvotes: 11