Reputation: 45943
Splitting on whitespace, period, comma or double quotes, and not on single quotes:
str = %Q{this is the.string to's split,real "ok" nice-like.}
str.split(/\s|\.|,|"/)
=> ["this", "is", "the", "string", "", "", "", "to's", "split", "real", "", "ok", "", "nice-like"]
How to eloquently remove empty strings?
How to eloquently remove strings that are shorter than MIN_LENGTH?
Upvotes: 7
Views: 3673
Reputation: 5914
We can achieve the same in multiple ways,
> str.split(/[\s\.,"]/) - [""]
=> ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice-like"]
> str.split(/[\s\.,"]/).select{|sub_string| sub_string.present?}
=> ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice-like"]
> str.scan /\w+'?\w+/
=> ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice", "like"]
Upvotes: 2
Reputation: 168101
The idea of using split
is not right in this case. You should be using scan
.
str = %Q{this is the.string to's split,real "ok" nice-like.}
str.scan(/[\w'-]+/)
# => ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice-like"]
In order to match strings that are MIN_LENGTH
or longer, do like this:
MIN_LENGTH = 3
str.scan(/[\w'-]{#{MIN_LENGTH},}/)
# => ["this", "the", "string", "to's", "split", "real", "nice-like"]
When to use split, when to use scan
scan
.split
.scan
.split
.Upvotes: 8
Reputation: 20000
I'm not entirely clear on the problem domain, but if you just want to avoid the empty strings, why not split on one or more occurrences of your separators?
str.split /[\s\.,"]+/
Upvotes: 8
Reputation: 22258
MIN_LENGTH = 2
new_strings = str.split(/\s|\.|,|"/).reject{ |s| s.length < MIN_LENGTH }
Upvotes: 1
Reputation: 3152
I would think a simple way to do that is as follows:
str.split(/\s|\.|,|"/).select{|s| s.length >= MIN_LENGTH}
Upvotes: 6