marcamillion
marcamillion

Reputation: 33755

How do I match every word in a string except for the last word?

I have the following strings:

Chicago CPA
New York CPA
West Virginia Accountant

How do I always just chop off the last word (and the preceding whitespace) in the string, preserving all other words before the last word?

So the correct versions of the above data set would be:

Chicago
New York
West Virginia

Also, is it possible to test matching groups on Rubular or is there another online regex editor/tester that I can use to test regexes with matching groups?

Edit 1

Many of the answers are great in theory. I read them, I understand them and I test them on a vanilla string and they seem to work. But when I try it on my data, it doesn't. I was stumped for a while, and I just realized why.

This is the HTML I am working on:

<h1 class="search-term">
   Chicago&nbsp;<strong>Cpa</strong>
</h1>

So this the text, I am attempting to do this string manipulation on:

Chicago&nbsp;<strong>Cpa</strong>

So here is what happens when I try each of the answers below.


@Darshan's:

[56] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text
=> "Chicago Cpa"
[57] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.class
=> String
[58] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.match(/(.*) \w+\z/)[1]
NoMethodError: undefined method `[]' for nil:NilClass
from (pry):57:in `<class:PageCrawler>'
[59] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text[/.*(?=\s\w+\z)/]
=> nil

@Lucas's own:

[60] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text
=> "Chicago Cpa"
[61] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.class
=> String
[62] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.split()[0...-1].join(' ')
=> ""

@Eric's own:

[65] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text
=> "Chicago Cpa"
[66] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.class
=> String
[67] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.split().reverse.drop(1).reverse.join(" ")
=> ""

@Casimir's own (this one is the best so far, actually):

[68] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text
=> "Chicago Cpa"
[69] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.class
=> String
[70] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.sub(/\W+\w+\W*$/, '')
=> "Chicago"

@Santosh's own:

[71] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text
=> "Chicago Cpa"
[72] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.class
=> String
[73] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text[/(.*)\s/,1]
=> nil

My apologies for not doing this earlier, but I didn't anticipate this being an issue.

Upvotes: 0

Views: 174

Answers (6)

Tiago Lopo
Tiago Lopo

Reputation: 7959

You can use the regex /^(.*)\s+\w+\s*$/ to capture everything but the last word:

Example:

str =  <<~EOF
        Chicago CPA
        New York CPA
        West Virginia Accountant
EOF

str.each_line do |line|
        puts line.match(/^(.*)\s+\w+\s*$/).captures.first
end

Output:

Chicago
New York
West Virginia

Upvotes: 0

Darshan Rivka Whittle
Darshan Rivka Whittle

Reputation: 34031

I'll preface by saying I'm not particularly good with regular expressions, and I'm not sure off the top of my head (nor do I feel inclined to benchmark or think hard about) whether this would tend to be more or less efficient than @LucasP's non-regex approach. But this is the obvious approach that comes to mind for me:

s.match(/(.*) \w+\z/)[1]

That matches at the end of the string one or more word characters preceded by a space, and puts everything before that into a group that you then grab.

data = ['Chicago CPA',
        'New York CPA',
        'West Virginia Accountant']

data.map{|s| s.match(/(.*) \w+\z/)[1]}
# => ["Chicago", "New York", "West Virginia"]

Edit: A variant on this approach, suggested by @CarySwoveland, is to use a lookahead expression to ignore the part we want to discard, rather than my initial approach of putting the part we want into a capturing group that we then access. Here's a version of that approach:

data.map{|s| s[/.*(?=\s\w+\z)/]}
# => ["Chicago", "New York", "West Virginia"]

Edit 2: With your added information, it's now clear that the issue you were facing is that you have non-breaking spaces, which even with \s aren't matched (\s only matches ASCII whitespace, equivalent to [ \t\r\n\f]). So using either the POSIX bracket expression [[:space:]] or explicitly matching \u00A0 for the non-breaking space character works, assuming all are non-breaking spaces. I prefer the former, since you might have other whitespace there sometimes:

data.map{|s| s[/.*(?=[[:space:]]\w+\z)/]}

Upvotes: 4

Santosh Sharma
Santosh Sharma

Reputation: 2248

Try Following.

str = ['Chicago CPA', 'New York CPA', 'West Virginia Accountant']

str.map{|s| s[0...s.rindex(' ')]}

output: ["Chicago", "New York", "West Virginia"]

Using Regexp.

str2 = "West Virginia Accountant"
p str2[/(.*)\s/,1]

output: "West Virginia"

Upvotes: 0

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

Assuming you have more than one word, you can use a replacement:

'West Virginia Accountant'.sub(/\W+\w+\W*$/, '')

Upvotes: 1

Eric
Eric

Reputation: 380

"New York Accountant".split().reverse.drop(1).reverse.join(" ")

Upvotes: 0

LucasP
LucasP

Reputation: 303

One way of achieving this is the following:

myString.split()[0...-1].join(' ')

Where myString is each string you want to perform this operation on.

  1. First you split from string to a list containing each word.

  2. Then select the sublist that contains all elements except the last one.

  3. Finally you go back from list to a string.

Upvotes: 2

Related Questions