grabury
grabury

Reputation: 5599

How do I separate the hashtags from a tweet?

What would be a good way to remove the hash-tags from a string and then join the hash-tag words together in another string separated by commas:

'Some interesting tweet #hash #tags'

The result would be:

'Some interesting tweet'

And:

'hash,tags'

Upvotes: 0

Views: 577

Answers (3)

the Tin Man
the Tin Man

Reputation: 160631

An alternate path is to use scan then remove the hash tags:

tweet = 'Some interesting tweet #hash #tags'

tags = tweet.scan(/#\w+/).uniq
tweet = tweet.gsub(/(?:#{ Regexp.union(tags).source })\b/, '').strip.squeeze(' ') # => "Some interesting tweet"
tags.join(',').tr('#', '') # => "hash,tags"

Dissecting it shows:

  • tweet.scan(/#\w+/) returns an array ["#hash", "#tags"].
  • uniq would remove any duplicated tags.
  • Regexp.union(tags) returns (?-mix:\#hash|\#tags).
  • Regexp.union(tags).source returns \#hash|\#tags. We don't want the pattern-flags at the start, so using source fixes that.
  • /(?:#{ Regexp.union(tags).source })\b/ returns the regular expression /(?:\#hash|\#tags)\b/.
  • tr is an extremely fast way to translate one character or characters to another, or strip them.

The final regex isn't the most optimized that can be generated. I'd actually write code to generate:

/#(?:hash|tags)\b/

but how to do that is left as an exercise for you. And, for short strings it won't make much difference as far as speed goes.

Upvotes: 2

Tall Paul
Tall Paul

Reputation: 2450

This has an array of hash that starts out empty It then splits the hash tag based off spaces It then looks for a hash tag and grabs the rest of the word It then stores it into the array

array_of_hashetags = []
array_of_words = []

str = "Some interesting tweet #hash #tags"

str.split.each do |x|
  if /\#\w+/ =~ x
    array_of_hashetags << x.gsub(/\#/, "")
  else 
    array_of_words << x
  end
end

Hope the helps

Upvotes: 0

Arup Rakshit
Arup Rakshit

Reputation: 118299

str = 'Some interesting tweet #hash #tags'
a,b = str.split.partition{|e| e.start_with?("#")}
# => [["#hash", "#tags"], ["Some", "interesting", "tweet"]]
a
# => ["#hash", "#tags"]
b
# => ["Some", "interesting", "tweet"]
a.join(",").delete("#")
# => "hash,tags"
b.join(" ")
# => "Some interesting tweet"

Upvotes: 6

Related Questions