NickEckhart
NickEckhart

Reputation: 478

Split a string into a string and an integer

I have several strings that contain one or more digits and may also contain one or more letters following the digits (caps on letters don't matter). The strings follow the following regex pattern:

[0-9]+[a-zA-z]*

and may look like:

"15791"
"14810A"
"10480ABCD"
"5ABCDEFGH"

If one of the strings above contains non-numerical characters, how do I split the numbers (first part) into an integer and the letters (second part) into a string?

I know I can split a string like this:

array = "1,2,3,4".split(',')

But this doesn't help since I don't have a separator.

Upvotes: 3

Views: 5045

Answers (4)

Cary Swoveland
Cary Swoveland

Reputation: 110675

[Edit: for some reason @Humza deleted his answer, so I've undeleted mine. I had previously posted this, but then deleted it when I noticed that Humza had already posted a similar answer.]

I feel like I must be missing something, as it seems to have a straightforward solution:

def extract(str)
  str.scan(/\d+|[A-Z]+/i)
end

extract "15791"     #=> ["15791"] 
extract "14810A"    #=> ["14810", "A"] 
extract "10480ABCD" #=> ["10480", "ABCD"]
extract "5ABCDEFGH" #=> ["5", "ABCDEFGH"] 

Upvotes: 0

sawa
sawa

Reputation: 168101

The splitter is the non-numerical characters themselves:

"10480ABCD".split(/(\D+)/)
# => ["10480", "ABCD"]

Upvotes: 9

Stefan
Stefan

Reputation: 114178

You can always use match:

re = /(\d+)([a-z]*)/i
str = "10480ABCD"

m = re.match(str)
m    #=> #<MatchData "10480ABCD" 1:"10480" 2:"ABCD">
m[0] #=> "10480"
m[1] #=> "ABCD"

Use MatchData#[] to extract capture groups:

re.match(str)[1, 2]
["10480", "ABCD"]

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174706

Use a positive lookbehind assertion based regex in string.split.

> "10480ABCD".split(/(?<=\d)(?=[A-Za-z])/)
=> ["10480", "ABCD"]
  • (?<=\d) Positive lookbehind which asserts that the match must be preceded by a digit character.

  • (?=[A-Za-z]) which asserts that the match must be followed by an alphabet. So the above regex would match the boundary which exists between a digit and an alphabet. Splitting your input according to the matched boundary will give you the desired output.

OR

Use string.scan

> "10480ABCD".scan(/\d+|[A-Za-z]+/)
=> ["10480", "ABCD"]

Upvotes: 11

Related Questions