dmanaster
dmanaster

Reputation: 587

Twitter API: How do I match punctuation at the end of a hashtag?

I am using the Twitter gem to generate a list of recent tweets with a specific hashtag that contain images.

It is working fine, but I have noticed that when people append punctuation to the hashtag in their tweets, the API does not include them in my search results. To illustrate, when I search for #sourcecon it does not include tweets that use #sourcecon!

Running separate searches via the API for #sourcecon. or #sourcecon! does not help - it ignores the punctation and generates the same list.

My code is here:

twitter_client.search("'#sourcecon' filter:images", result_type: "recent", :since_id => last_tweet).collect

vs

twitter_client.search("'#sourcecon!' filter:images", result_type: "recent", :since_id => last_tweet).collect

I know that Twitter treats punctuation as not being a part of the hashtag. From the twitter API:

Note that punctuation is not considered to be part of a #hashtag or @mention, so a track term containing punctuation will not match either #hashtags or @mentions.

But shouldn't that mean that it would ignore it completely and return all results (including the ones that include the appended punctuation in the tweets?)

Does anyone know how to get search results here that would include mentions of the hashtag both with and without punctuation at the end?

Upvotes: 1

Views: 374

Answers (1)

David Gross
David Gross

Reputation: 1873

With twitter search the punctuation and special characters will be considered part of the term you're searching for, so searching for '#twitter!' will return '#twitter!', "twitter?', '#twitter', etc. What you can do is check if the search includes any type of punctuation searched with and if it does you can sort the array to add those tweets first.

require 'twitter'

module TwitterSearch
  extend self

  @twiiter_client = Twitter::REST::Client.new do |config|
    config.consumer_key        = ""
    config.consumer_secret     = ""
    config.access_token        = ""
    config.access_token_secret = ""
  end

  # search returns  
  # Check out what @researchgoddess is up to at #sourcecon! 
  # What a welcome from @SourceCon! Thanks @CareerBuilder for hosting.#   
  # RT @JRoberts257: Happy hour at #SourceCon! Thanks @CareerBuilder for 
  # Happy hour at #SourceCon! Thanks @CareerBuilder for sponsoring. ht
  # @RT @cybsearchjoe: #SourceCon is rocking
  # etc 

  def search(text)
    tweets = @twitter_client.search("#{text} filter:images", result_type: "recent").take(30).collect do |tweet|
        "#{tweet.text}"
    end
    # looks to see if there is puncuation at the end of the text "!.?{}[]" It will ignore the # at the beginning 
    tweets = sort_tweets(text, tweets) if text[1..text.length] =~ /[[:punct:]]/
    puts tweets 
  end


  # sorts tweets based off index given in match_phrase 
  def sort_tweets(text, tweets)
    tweets.sort do |phrase, other_phrase| 
      match_phrase(phrase, text, tweets) <=> match_phrase(other_phrase, text, tweets) 
    end
  end

  # if phrase matches punc_text(text) the phrase will be inserted at the beginning of the array else it will return its previous index. 
  def match_phrase(phrase, text, tweets)
    phrase.match(/#{punc_text(text)}/i).nil? ? tweets.index(phrase) + 1 : 0 
  end

  # adds backslash to punctuation '#sourcecon//?|!|.'
  def punc_text(text)
    text[1..text.length].gsub(/([[:punct:]])/){|punc| "\\#{punc}"}
  end
end

TwitterSearch.search('#sourcecon!')

Upvotes: 2

Related Questions