Reputation: 48495
I have been looking for an elegant and efficient way to chunk a string into substrings of a given length in Ruby.
So far, the best I could come up with is this:
def chunk(string, size)
(0..(string.length-1)/size).map{|i|string[i*size,size]}
end
>> chunk("abcdef",3)
=> ["abc", "def"]
>> chunk("abcde",3)
=> ["abc", "de"]
>> chunk("abc",3)
=> ["abc"]
>> chunk("ab",3)
=> ["ab"]
>> chunk("",3)
=> []
You might want chunk("", n)
to return [""]
instead of []
. If so, just add this as the first line of the method:
return [""] if string.empty?
Would you recommend any better solution?
Edit
Thanks to Jeremy Ruten for this elegant and efficient solution: [edit: NOT efficient!]
def chunk(string, size)
string.scan(/.{1,#{size}}/)
end
Edit
The string.scan solution takes about 60 seconds to chop 512k into 1k chunks 10000 times, compared with the original slice-based solution which only takes 2.4 seconds.
Upvotes: 103
Views: 43730
Reputation: 2371
Here is another way to do it:
"abcdefghijklmnopqrstuvwxyz".chars.to_a.each_slice(3).to_a.map {|s| s.to_s }
Or,
"abcdefghijklmnopqrstuvwxyz".chars.each_slice(3).map(&:join)
Either:
=> ["abc", "def", "ghi", "jkl", "mno", "pqr", "stu", "vwx", "yz"]
Upvotes: 25
Reputation: 8769
I personally followed the idea of user8556428, to avoid the costly intermediate values that most proposals introduce, and to avoid modifying the input string. And I want to be able to use it as a generator (for instance to use s.each_slice.with_index
).
My use case is really about bytes, not characters. In the case of character-size, strscan is a great solution.
class String
# Slices of fixed byte-length. May cut multi-byte characters.
def each_slice(n = 1000, &block)
return if self.empty?
if block_given?
last = (self.length - 1) / n
(0 .. last).each do |i|
yield self.slice(i * n, n)
end
else
enum_for(__method__, n)
end
end
end
p "abcdef".each_slice(3).to_a # => ["abc", "def"]
p "abcde".each_slice(3).to_a # => ["abc", "de"]
p "abc".each_slice(3).to_a # => ["abc"]
p "ab".each_slice(3).to_a # => ["ab"]
p "".each_slice(3).to_a # => []
Upvotes: 0
Reputation: 1013
Here is another one solution for slightly different case, when processing large strings and there is no need to store all chunks at a time. In this way it stores single chunk at a time and performs much faster than slicing strings:
io = StringIO.new(string)
until io.eof?
chunk = io.read(chunk_size)
do_something(chunk)
end
Upvotes: 7
Reputation: 61
I made a little test that chops about 593MB data into 18991 32KB pieces. Your slice+map version ran for at least 15 minutes using 100% CPU before I pressed ctrl+C. This version using String#unpack finished in 3.6 seconds:
def chunk(string, size)
string.unpack("a#{size}" * (string.size/size.to_f).ceil)
end
Upvotes: 6
Reputation: 152
A better solution which takes into account the last part of the string which could be less than the chunk size:
def chunk(inStr, sz)
return [inStr] if inStr.length < sz
m = inStr.length % sz # this is the last part of the string
partial = (inStr.length / sz).times.collect { |i| inStr[i * sz, sz] }
partial << inStr[-m..-1] if (m % sz != 0) # add the last part
partial
end
Upvotes: 1
Reputation: 1433
I think this is the most efficient solution if you know your string is a multiple of chunk size
def chunk(string, size)
(string.length / size).times.collect { |i| string[i * size, size] }
end
and for parts
def parts(string, count)
size = string.length / count
count.times.collect { |i| string[i * size, size] }
end
Upvotes: 6
Reputation: 176753
Use String#scan
:
>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{4}/)
=> ["abcd", "efgh", "ijkl", "mnop", "qrst", "uvwx"]
>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{1,4}/)
=> ["abcd", "efgh", "ijkl", "mnop", "qrst", "uvwx", "yz"]
>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{1,3}/)
=> ["abc", "def", "ghi", "jkl", "mno", "pqr", "stu", "vwx", "yz"]
Upvotes: 177
Reputation: 237100
test.split(/(...)/).reject {|v| v.empty?}
The reject is necessary because it otherwise includes the blank space between sets. My regex-fu isn't quite up to seeing how to fix that right off the top of my head.
Upvotes: 1
Reputation: 112406
Are there some other constraints you have in mind? Otherwise I'd be awfully tempted to do something simple like
[0..10].each {
str[(i*w),w]
}
Upvotes: 0