Reputation: 6084
Previous related questions only have delimiter with length == 1.
What I want is the following (for example)
str = 'Hello: Alice Hello: Bob Hello: Charlie Hello: David'
arr = str.magic_split('Hello:')
=> arr[0] = 'Hello: Alice '
arr[1] = 'Hello: Bob '
arr[2] = 'Hello: Charlie '
arr[3] = 'Hello: David'
I tried str.scan(/Hello:/), but don't know how to crack regex to make it work. Thanks a lot.
I see that some of the answers only work for this particular case. Let me be more specific.
The file I want to split is like the following and the delimiter is "Certificate:"
Certificate:
Data: ...
Signature Algorithm: ...
...
-----BEGIN CERTIFICATE-----
F19ibG6uZyBJbmR1c3RyaWVzIEluYzESMBAGA1UECwwJTWV6emFuaW5lMRMwEQYD\n
2O2RV6HR84N2/A5ZPRF8AQMXJCLIR4qMe/d97/1XK6JQQLUI5NaNroUkW3+tjXo/\n
ovl3vom6xOwUUcFDdv2QoCYBVADX7W2RaVP0JGfiDcekOTwtdos/tmsblboR8oEp\n
fbxD45AowT+khXnPDCQWWpslXJoKMBkaWH7ajb+yKfEYGzRPEmq+v/FPMY9mlJhX\n
epciB5FNO5krO+cyhky5tBZTIv7qCu3kc36dcQXIOTakc7CdoVgwLnytebwTqtpG\n
KuLLH46U8Pp3eeiDDBxYJlz6a2bsbtOaKb1CKMFB3x8LLfLbF4Sh+ScDHetkJDh5\n
...
Certificate:
...
Certificate:
...
Basically, between "Certificate:" there will be random ASCII characters.
Thanks again.
Upvotes: 2
Views: 744
Reputation: 110665
So many ways...
str = 'Hello: Alice Hello: Bob Hello: Charlie Hello: David'
str.split("Hello:")[1..-1].map {|s| "Hello:"+s}
or
str.split(/(Hello:)/)[1..-1].each_slice(2).map(&:join)
Notice that in the latter method a regex is used which contains the string "Hello:"
in a capture group. As a result:
str.split(/(Hello:)/)
#=> ["", "Hello:", " Alice ", "Hello:", " Bob ",
# "Hello:", " Charlie ", "Hello:", " David"]
whereas:
str.split(/Hello:/)
#=> ["", " Alice ", " Bob ", " Charlie ", " David"]
Upvotes: 3
Reputation: 43013
Try this regex:
(Hello:\s+.+?)(?=Hello:|$)
http://rubular.com/r/l5WD6A1a2r
Upvotes: 5
Reputation: 160551
This is a common case for using slice_before
:
text = "Certificate:
Data: ...
Signature Algorithm: ...
...
-----BEGIN CERTIFICATE-----
F19ibG6uZyBJbmR1c3RyaWVzIEluYzESMBAGA1UECwwJTWV6emFuaW5lMRMwEQYD
2O2RV6HR84N2/A5ZPRF8AQMXJCLIR4qMe/d97/1XK6JQQLUI5NaNroUkW3+tjXo/
ovl3vom6xOwUUcFDdv2QoCYBVADX7W2RaVP0JGfiDcekOTwtdos/tmsblboR8oEp
fbxD45AowT+khXnPDCQWWpslXJoKMBkaWH7ajb+yKfEYGzRPEmq+v/FPMY9mlJhX
epciB5FNO5krO+cyhky5tBZTIv7qCu3kc36dcQXIOTakc7CdoVgwLnytebwTqtpG
KuLLH46U8Pp3eeiDDBxYJlz6a2bsbtOaKb1CKMFB3x8LLfLbF4Sh+ScDHetkJDh5
...
Certificate:
...
Certificate:
...
"
certificates = text.lines.slice_before(/^Certificate/).to_a
# => [["Certificate:\n",
# " Data: ...\n",
# " Signature Algorithm: ...\n",
# "...\n",
# "-----BEGIN CERTIFICATE-----\n",
# "F19ibG6uZyBJbmR1c3RyaWVzIEluYzESMBAGA1UECwwJTWV6emFuaW5lMRMwEQYD\n",
# "2O2RV6HR84N2/A5ZPRF8AQMXJCLIR4qMe/d97/1XK6JQQLUI5NaNroUkW3+tjXo/\n",
# "ovl3vom6xOwUUcFDdv2QoCYBVADX7W2RaVP0JGfiDcekOTwtdos/tmsblboR8oEp\n",
# "fbxD45AowT+khXnPDCQWWpslXJoKMBkaWH7ajb+yKfEYGzRPEmq+v/FPMY9mlJhX\n",
# "epciB5FNO5krO+cyhky5tBZTIv7qCu3kc36dcQXIOTakc7CdoVgwLnytebwTqtpG\n",
# "KuLLH46U8Pp3eeiDDBxYJlz6a2bsbtOaKb1CKMFB3x8LLfLbF4Sh+ScDHetkJDh5\n",
# "...\n"],
# ["Certificate:\n", "...\n"],
# ["Certificate:\n", "...\n"]]
# ["Certificate:\n", "...\n"]]
slice_before
walks through an Array looking for lines that match a pattern. When it finds them it creates a sub-array of the previous lines, then continues looking for the next match. In the output above you can see the separate sub-arrays for each certificate created.
It's an amazingly useful method.
If, after slicing, you want to grab an encoded certificate, extract just those lines, because they should be at set offsets:
certificates.first[5 .. 10]
# => ["F19ibG6uZyBJbmR1c3RyaWVzIEluYzESMBAGA1UECwwJTWV6emFuaW5lMRMwEQYD\n",
# "2O2RV6HR84N2/A5ZPRF8AQMXJCLIR4qMe/d97/1XK6JQQLUI5NaNroUkW3+tjXo/\n",
# "ovl3vom6xOwUUcFDdv2QoCYBVADX7W2RaVP0JGfiDcekOTwtdos/tmsblboR8oEp\n",
# "fbxD45AowT+khXnPDCQWWpslXJoKMBkaWH7ajb+yKfEYGzRPEmq+v/FPMY9mlJhX\n",
# "epciB5FNO5krO+cyhky5tBZTIv7qCu3kc36dcQXIOTakc7CdoVgwLnytebwTqtpG\n",
# "KuLLH46U8Pp3eeiDDBxYJlz6a2bsbtOaKb1CKMFB3x8LLfLbF4Sh+ScDHetkJDh5\n"]
Upvotes: 5
Reputation: 1575
Not sure if this will work for your particular case, but you could try:
splitta = "Hello: "
str.split(splitta).drop(1).map { |s| splitta + s }
which returns
=> ["Hello: Alice ", "Hello: Bob ", "Hello: Charlie ", "Hello: David"]
Upvotes: 1
Reputation: 19879
> str = 'Hello: Alice Hello: Bob Hello: Charlie Hello: David'
=> "Hello: Alice Hello: Bob Hello: Charlie Hello: David"
> str.scan(/Hello: \w+\b/)
=> ["Hello: Alice", "Hello: Bob", "Hello: Charlie", "Hello: David"]
Pretty dependent on your string containing just alpha numeric, but it does work for your case.
Upvotes: 4