Shivam Mangla
Shivam Mangla

Reputation: 92

Parsing hashtags in twitter API PHP

I want to parse hashtags from the tweets I'm retrieving from twitter. Now, I didn't find anything available in the api. So, I'm parsing it on my own using php. I've tried several things.

<?php
$subject = "This is a simple #hashtag";
$pattern = "#\S*\w";
preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>

I've also tried

$pattern = "/[#]"."[A-Za-z0-9-_]"."/g";

But then it shows /g isn't recognized by php. I've been trying to do this for quite a long time now but am not being able to do this. So please help.

P.S. : I've a very little idea about Regular Experssions.

Upvotes: 3

Views: 3149

Answers (3)

Ben Marshall
Ben Marshall

Reputation: 1764

There's an easier way using object prototypes, wrote a post detailing exactly how to do this with not only hastags, but usernames and URLs within tweets. Needed it for a project I'm working on where I'm grabbing tweets from the Twitter API.

https://benmarshall.me/parse-twitter-hashtags/

Here's the relevant code:

// Auto-link URLs in a string
// Usage: mystring.parseURL()
String.prototype.parseURL = function() {
  return this.replace(/[A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&~\?\/.=]+/g, function( url ) {
    return url.link( url );
  });
};

// Auto-link Twitter usernames in a string
// Usage: mystring.parseUsername()
String.prototype.parseUsername = function() {
  return this.replace(/[@]+[A-Za-z0-9-_]+/g, function( u ) {
    var username = u.replace("@","");

    return u.link( 'http://twitter.com/' + username );
  });
};

// Auto-link Twitter hashtags in a string
// Usage: mystring.parseHashtag()
String.prototype.parseHashtag = function() {
  return this.replace(/[#]+[A-Za-z0-9-_]+/g, function( t ) {
    var tag = t.replace("#","%23");

    return t.link( 'http://search.twitter.com/search?q=' + tag );
  });
};

Upvotes: 0

Harry Dobrev
Harry Dobrev

Reputation: 7706

You need to consider where a hashtag might appear. There are three cases:

  • at the beginning of a tweet,
  • after whitespace,
  • in the middle of a word - this must not be counted as a hashtag.

So this will match them correctly:

'/(^|\s)\#\w+/'

Explanation:

  • ^ can be used in OR statements
  • \s is used to catch spaces, tabs and new lines

Here is the complete code:

<?php
$subject = "#hashtag This is a simple #hashtag hello world #hastag2 last string not-a-hash-tag#hashtag3 and yet not -#hashtag";
$pattern = "/(?:^|\s)(\#\w+)/";
preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>

Upvotes: 1

Samuel Cook
Samuel Cook

Reputation: 16828

This works for me:

$subject = "This is a simple #hashtag hello world #hastag2 last string #hashtag3";
$pattern = "/(#\w+)/";
preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);

Upvotes: 0

Related Questions