B Seven
B Seven

Reputation: 45943

How to split text in Ruby without creating empty strings?

Splitting on whitespace, period, comma or double quotes, and not on single quotes:

str = %Q{this is the.string    to's split,real "ok" nice-like.}
str.split(/\s|\.|,|"/)
=> ["this", "is", "the", "string", "", "", "", "to's", "split", "real", "", "ok", "", "nice-like"]

How to eloquently remove empty strings?

How to eloquently remove strings that are shorter than MIN_LENGTH?

Upvotes: 7

Views: 3673

Answers (6)

nkm
nkm

Reputation: 5914

We can achieve the same in multiple ways,

 > str.split(/[\s\.,"]/) - [""]
=> ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice-like"]

 > str.split(/[\s\.,"]/).select{|sub_string| sub_string.present?}
=> ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice-like"]

 > str.scan /\w+'?\w+/
=> ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice", "like"]

Upvotes: 2

sawa
sawa

Reputation: 168101

The idea of using split is not right in this case. You should be using scan.

str = %Q{this is the.string    to's split,real "ok" nice-like.}
str.scan(/[\w'-]+/)
# => ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice-like"]

In order to match strings that are MIN_LENGTH or longer, do like this:

MIN_LENGTH = 3
str.scan(/[\w'-]{#{MIN_LENGTH},}/)
# => ["this", "the", "string", "to's", "split", "real", "nice-like"]

When to use split, when to use scan

  • When the delimiters are messy and making a regex match them is difficult, use scan.
  • When the substrings to extract are messy and making a regex match them is difficult, use split.
  • When you want to impose conditions on the form of the substrings to be extracted, you scan.
  • When you want to impose conditions on the form of the delimiters, use split.

Upvotes: 8

Tobias Cohen
Tobias Cohen

Reputation: 20000

I'm not entirely clear on the problem domain, but if you just want to avoid the empty strings, why not split on one or more occurrences of your separators?

str.split /[\s\.,"]+/

Upvotes: 8

xdazz
xdazz

Reputation: 160843

Try the below:

str.split(/\s*[.,"\s]\s*/)

Upvotes: 2

Kyle
Kyle

Reputation: 22258

MIN_LENGTH = 2

new_strings = str.split(/\s|\.|,|"/).reject{ |s| s.length < MIN_LENGTH }

Upvotes: 1

Nikhil
Nikhil

Reputation: 3152

I would think a simple way to do that is as follows:

str.split(/\s|\.|,|"/).select{|s| s.length >= MIN_LENGTH}

Upvotes: 6

Related Questions