Reputation: 253
I have a grouping of string variables that will be something like "height_low"
. I want to use something clean like gsub or something else to get rid of the underscore and everything past it. so it will be like "height"
. Does someone have a solution for this? Thanks.
Upvotes: 1
Views: 149
Reputation: 160551
Learn to think in terms of searches vs. replacements. It's usually easier, faster, and cleaner to search for, and extract, what you want, than it is to search for, and strip, what you don't want.
Consider this:
'a_b_c'[/^(.*?)_/, 1] # => "a"
It looks for only what you want, which is the text from the start of the string, up to _
. Everything preceding _
is captured, and returned.
The alternates:
'a_b_c'.sub(/_.+$/, '') # => "a"
'a_b_c'.gsub(/_.+$/, '') # => "a"
have to look backwards until the engine is sure there are no more _
, then the string can be truncated.
Here's a little benchmark showing how that affects things:
require 'fruity'
compare do
string_capture { 'a_b_c'[/^(.*?)_/, 1] }
string_sub { 'a_b_c'.sub(/_.+$/, '') }
string_gsub { 'a_b_c'.gsub(/_.+$/, '') }
look_ahead { 'a_b_c'[/^.+?(?=_)/] }
string_index { 'a_b_c'[0, s.index("_") || s.length] }
end
# >> Running each test 8192 times. Test will take about 1 second.
# >> string_index is faster than string_capture by 19.999999999999996% ± 10.0%
# >> string_capture is similar to look_ahead
# >> look_ahead is faster than string_sub by 70.0% ± 10.0%
# >> string_sub is faster than string_gsub by 2.9x ± 0.1
Again, searching is going to be faster than any sort of replace, so think about what you're doing.
The downfall to the "search" regex-based tactics like "string_capture" and "look_ahead" is they don't handle missing _
, so if there's any question whether your string will, or will not, have _
, then use the "string_index" method which will fall-back to using string.length
to grab the entire string.
Upvotes: 0
Reputation: 118261
Try as below using str[regexp, capture] → new_str or nil
:
If a Regexp is supplied, the matching portion of the string is returned. If a capture follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.
strings.map { |s| s[/(.*?)_.*$/,1] }
Upvotes: 1
Reputation: 2580
FWIW, solutions based on String#split
perform poorly because they have to parse the whole string and allocate an array. Their performance degrades as the number of underscores increases. The following performs better:
string[0, string.index("_") || string.length]
Benchmark results (with number of underscores in parenthesis):
user system total real
String#split (0) 0.640000 0.000000 0.640000 ( 0.650323)
String#split (1) 0.760000 0.000000 0.760000 ( 0.759951)
String#split (9) 2.180000 0.010000 2.190000 ( 2.192356)
String#index (0) 0.610000 0.000000 0.610000 ( 0.625972)
String#index (1) 0.580000 0.010000 0.590000 ( 0.589463)
String#index (9) 0.600000 0.000000 0.600000 ( 0.605253)
Benchmarks:
strings = ["x", "x_x", "x_x_x_x_x_x_x_x_x_x"]
Benchmark.bm(16) do |bm|
strings.each do |string|
bm.report("String#split (#{string.count("_")})") do
1000000.times { string.split("_").first }
end
end
strings.each do |string|
bm.report("String#index (#{string.count("_")})") do
1000000.times { string[0, string.index("_") || string.length] }
end
end
end
Upvotes: 1
Reputation: 48318
If you're looking for something "like gsub", why not just use gsub?
"height_low".gsub(/_.*$/, "") #=> "height"
In my opinion though, this is a bit cleaner:
"height_low".split('_').first #=> "height"
Another option is to use partition:
"height_low".partition("_").first #=> "height"
Upvotes: 0
Reputation: 29281
The unavoidable regex answer. (Assuming strings
is an array of strings.)
strings.map! { |s| s[/^.+?(?=_)/] }
Upvotes: 1