Brent Moses
Brent Moses

Reputation: 253

Manipulate string in ruby

I have a grouping of string variables that will be something like "height_low". I want to use something clean like gsub or something else to get rid of the underscore and everything past it. so it will be like "height". Does someone have a solution for this? Thanks.

Upvotes: 1

Views: 149

Answers (7)

the Tin Man
the Tin Man

Reputation: 160551

Learn to think in terms of searches vs. replacements. It's usually easier, faster, and cleaner to search for, and extract, what you want, than it is to search for, and strip, what you don't want.

Consider this:

'a_b_c'[/^(.*?)_/, 1] # => "a"

It looks for only what you want, which is the text from the start of the string, up to _. Everything preceding _ is captured, and returned.

The alternates:

'a_b_c'.sub(/_.+$/, '')  # => "a"
'a_b_c'.gsub(/_.+$/, '') # => "a"

have to look backwards until the engine is sure there are no more _, then the string can be truncated.

Here's a little benchmark showing how that affects things:

require 'fruity'

compare do
  string_capture { 'a_b_c'[/^(.*?)_/, 1]                }
  string_sub     { 'a_b_c'.sub(/_.+$/, '')              }
  string_gsub    { 'a_b_c'.gsub(/_.+$/, '')             }
  look_ahead     { 'a_b_c'[/^.+?(?=_)/]                 }
  string_index   { 'a_b_c'[0, s.index("_") || s.length] }
end

# >> Running each test 8192 times. Test will take about 1 second.
# >> string_index is faster than string_capture by 19.999999999999996% ± 10.0%
# >> string_capture is similar to look_ahead
# >> look_ahead is faster than string_sub by 70.0% ± 10.0%
# >> string_sub is faster than string_gsub by 2.9x ± 0.1

Again, searching is going to be faster than any sort of replace, so think about what you're doing.

The downfall to the "search" regex-based tactics like "string_capture" and "look_ahead" is they don't handle missing _, so if there's any question whether your string will, or will not, have _, then use the "string_index" method which will fall-back to using string.length to grab the entire string.

Upvotes: 0

Arup Rakshit
Arup Rakshit

Reputation: 118261

Try as below using str[regexp, capture] → new_str or nil:

If a Regexp is supplied, the matching portion of the string is returned. If a capture follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.

strings.map { |s|  s[/(.*?)_.*$/,1] }

Upvotes: 1

jonahb
jonahb

Reputation: 2580

FWIW, solutions based on String#split perform poorly because they have to parse the whole string and allocate an array. Their performance degrades as the number of underscores increases. The following performs better:

string[0, string.index("_") || string.length]

Benchmark results (with number of underscores in parenthesis):

                       user     system      total        real
String#split (0)   0.640000   0.000000   0.640000 (  0.650323)
String#split (1)   0.760000   0.000000   0.760000 (  0.759951)
String#split (9)   2.180000   0.010000   2.190000 (  2.192356)
String#index (0)   0.610000   0.000000   0.610000 (  0.625972)
String#index (1)   0.580000   0.010000   0.590000 (  0.589463)
String#index (9)   0.600000   0.000000   0.600000 (  0.605253)

Benchmarks:

strings = ["x", "x_x", "x_x_x_x_x_x_x_x_x_x"]

Benchmark.bm(16) do |bm|
    strings.each do |string|
        bm.report("String#split (#{string.count("_")})") do
            1000000.times { string.split("_").first }
        end
    end
    strings.each do |string|
        bm.report("String#index (#{string.count("_")})") do
            1000000.times { string[0, string.index("_") || string.length] }
        end
    end
end

Upvotes: 1

Ajedi32
Ajedi32

Reputation: 48318

If you're looking for something "like gsub", why not just use gsub?

"height_low".gsub(/_.*$/, "") #=> "height"

In my opinion though, this is a bit cleaner:

"height_low".split('_').first #=> "height"

Another option is to use partition:

"height_low".partition("_").first #=> "height"

Upvotes: 0

Marcelo De Polli
Marcelo De Polli

Reputation: 29281

The unavoidable regex answer. (Assuming strings is an array of strings.)

strings.map! { |s| s[/^.+?(?=_)/] }

Upvotes: 1

lipanski
lipanski

Reputation: 1763

Shorter:

my_string.split('_').first

Upvotes: 1

Linuxios
Linuxios

Reputation: 35783

Try this:

strings.map! {|s| s.split('_').first}

Upvotes: 3

Related Questions