user180574
user180574

Reputation: 6084

Ruby: How to split string while keeping delimiter and delimiter has length > 1?

Previous related questions only have delimiter with length == 1.

What I want is the following (for example)

str = 'Hello: Alice Hello: Bob Hello: Charlie Hello: David'
arr = str.magic_split('Hello:')

=> arr[0] = 'Hello: Alice '
   arr[1] = 'Hello: Bob '
   arr[2] = 'Hello: Charlie '
   arr[3] = 'Hello: David'

I tried str.scan(/Hello:/), but don't know how to crack regex to make it work. Thanks a lot.

I see that some of the answers only work for this particular case. Let me be more specific.

The file I want to split is like the following and the delimiter is "Certificate:"

Certificate:
    Data: ...
    Signature Algorithm: ...
...
-----BEGIN CERTIFICATE-----
F19ibG6uZyBJbmR1c3RyaWVzIEluYzESMBAGA1UECwwJTWV6emFuaW5lMRMwEQYD\n
2O2RV6HR84N2/A5ZPRF8AQMXJCLIR4qMe/d97/1XK6JQQLUI5NaNroUkW3+tjXo/\n
ovl3vom6xOwUUcFDdv2QoCYBVADX7W2RaVP0JGfiDcekOTwtdos/tmsblboR8oEp\n
fbxD45AowT+khXnPDCQWWpslXJoKMBkaWH7ajb+yKfEYGzRPEmq+v/FPMY9mlJhX\n
epciB5FNO5krO+cyhky5tBZTIv7qCu3kc36dcQXIOTakc7CdoVgwLnytebwTqtpG\n
KuLLH46U8Pp3eeiDDBxYJlz6a2bsbtOaKb1CKMFB3x8LLfLbF4Sh+ScDHetkJDh5\n
...
Certificate:
...
Certificate:
...

Basically, between "Certificate:" there will be random ASCII characters.

Thanks again.

Upvotes: 2

Views: 744

Answers (6)

Cary Swoveland
Cary Swoveland

Reputation: 110665

So many ways...

 str = 'Hello: Alice Hello: Bob Hello: Charlie Hello: David'
 str.split("Hello:")[1..-1].map {|s| "Hello:"+s}

or

 str.split(/(Hello:)/)[1..-1].each_slice(2).map(&:join)

Notice that in the latter method a regex is used which contains the string "Hello:" in a capture group. As a result:

 str.split(/(Hello:)/)
   #=> ["", "Hello:", " Alice ", "Hello:", " Bob ",
   #    "Hello:", " Charlie ", "Hello:", " David"] 

whereas:

 str.split(/Hello:/)
   #=> ["", " Alice ", " Bob ", " Charlie ", " David"]

Upvotes: 3

Stephan
Stephan

Reputation: 43013

Try this regex:

(Hello:\s+.+?)(?=Hello:|$)

Description

Regular expression visualization

Demo

http://rubular.com/r/l5WD6A1a2r

Upvotes: 5

the Tin Man
the Tin Man

Reputation: 160551

This is a common case for using slice_before:

text = "Certificate:
    Data: ...
    Signature Algorithm: ...
...
-----BEGIN CERTIFICATE-----
F19ibG6uZyBJbmR1c3RyaWVzIEluYzESMBAGA1UECwwJTWV6emFuaW5lMRMwEQYD
2O2RV6HR84N2/A5ZPRF8AQMXJCLIR4qMe/d97/1XK6JQQLUI5NaNroUkW3+tjXo/
ovl3vom6xOwUUcFDdv2QoCYBVADX7W2RaVP0JGfiDcekOTwtdos/tmsblboR8oEp
fbxD45AowT+khXnPDCQWWpslXJoKMBkaWH7ajb+yKfEYGzRPEmq+v/FPMY9mlJhX
epciB5FNO5krO+cyhky5tBZTIv7qCu3kc36dcQXIOTakc7CdoVgwLnytebwTqtpG
KuLLH46U8Pp3eeiDDBxYJlz6a2bsbtOaKb1CKMFB3x8LLfLbF4Sh+ScDHetkJDh5
...
Certificate:
...
Certificate:
...
"

certificates = text.lines.slice_before(/^Certificate/).to_a
# => [["Certificate:\n",
#      "    Data: ...\n",
#      "    Signature Algorithm: ...\n",
#      "...\n",
#      "-----BEGIN CERTIFICATE-----\n",
#      "F19ibG6uZyBJbmR1c3RyaWVzIEluYzESMBAGA1UECwwJTWV6emFuaW5lMRMwEQYD\n",
#      "2O2RV6HR84N2/A5ZPRF8AQMXJCLIR4qMe/d97/1XK6JQQLUI5NaNroUkW3+tjXo/\n",
#      "ovl3vom6xOwUUcFDdv2QoCYBVADX7W2RaVP0JGfiDcekOTwtdos/tmsblboR8oEp\n",
#      "fbxD45AowT+khXnPDCQWWpslXJoKMBkaWH7ajb+yKfEYGzRPEmq+v/FPMY9mlJhX\n",
#      "epciB5FNO5krO+cyhky5tBZTIv7qCu3kc36dcQXIOTakc7CdoVgwLnytebwTqtpG\n",
#      "KuLLH46U8Pp3eeiDDBxYJlz6a2bsbtOaKb1CKMFB3x8LLfLbF4Sh+ScDHetkJDh5\n",
#      "...\n"],
#     ["Certificate:\n", "...\n"],
#     ["Certificate:\n", "...\n"]]
#     ["Certificate:\n", "...\n"]]

slice_before walks through an Array looking for lines that match a pattern. When it finds them it creates a sub-array of the previous lines, then continues looking for the next match. In the output above you can see the separate sub-arrays for each certificate created.

It's an amazingly useful method.

If, after slicing, you want to grab an encoded certificate, extract just those lines, because they should be at set offsets:

certificates.first[5 .. 10]
# => ["F19ibG6uZyBJbmR1c3RyaWVzIEluYzESMBAGA1UECwwJTWV6emFuaW5lMRMwEQYD\n",
#     "2O2RV6HR84N2/A5ZPRF8AQMXJCLIR4qMe/d97/1XK6JQQLUI5NaNroUkW3+tjXo/\n",
#     "ovl3vom6xOwUUcFDdv2QoCYBVADX7W2RaVP0JGfiDcekOTwtdos/tmsblboR8oEp\n",
#     "fbxD45AowT+khXnPDCQWWpslXJoKMBkaWH7ajb+yKfEYGzRPEmq+v/FPMY9mlJhX\n",
#     "epciB5FNO5krO+cyhky5tBZTIv7qCu3kc36dcQXIOTakc7CdoVgwLnytebwTqtpG\n",
#     "KuLLH46U8Pp3eeiDDBxYJlz6a2bsbtOaKb1CKMFB3x8LLfLbF4Sh+ScDHetkJDh5\n"]

Upvotes: 5

O-I
O-I

Reputation: 1575

Not sure if this will work for your particular case, but you could try:

splitta = "Hello: "
str.split(splitta).drop(1).map { |s| splitta + s }

which returns

=> ["Hello: Alice ", "Hello: Bob ", "Hello: Charlie ", "Hello: David"]

Upvotes: 1

alpha bravo
alpha bravo

Reputation: 7948

try this pattern (Hello:\s*(?:(?:(?!Hello:).)*)) Demo

Upvotes: 0

Philip Hallstrom
Philip Hallstrom

Reputation: 19879

> str = 'Hello: Alice Hello: Bob Hello: Charlie Hello: David'
 => "Hello: Alice Hello: Bob Hello: Charlie Hello: David"
> str.scan(/Hello: \w+\b/)
 => ["Hello: Alice", "Hello: Bob", "Hello: Charlie", "Hello: David"]

Pretty dependent on your string containing just alpha numeric, but it does work for your case.

Upvotes: 4

Related Questions