Reputation: 92
I want to parse hashtags from the tweets I'm retrieving from twitter. Now, I didn't find anything available in the api. So, I'm parsing it on my own using php. I've tried several things.
<?php
$subject = "This is a simple #hashtag";
$pattern = "#\S*\w";
preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>
I've also tried
$pattern = "/[#]"."[A-Za-z0-9-_]"."/g";
But then it shows /g isn't recognized by php. I've been trying to do this for quite a long time now but am not being able to do this. So please help.
P.S. : I've a very little idea about Regular Experssions.
Upvotes: 3
Views: 3149
Reputation: 1764
There's an easier way using object prototypes, wrote a post detailing exactly how to do this with not only hastags, but usernames and URLs within tweets. Needed it for a project I'm working on where I'm grabbing tweets from the Twitter API.
https://benmarshall.me/parse-twitter-hashtags/
Here's the relevant code:
// Auto-link URLs in a string
// Usage: mystring.parseURL()
String.prototype.parseURL = function() {
return this.replace(/[A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&~\?\/.=]+/g, function( url ) {
return url.link( url );
});
};
// Auto-link Twitter usernames in a string
// Usage: mystring.parseUsername()
String.prototype.parseUsername = function() {
return this.replace(/[@]+[A-Za-z0-9-_]+/g, function( u ) {
var username = u.replace("@","");
return u.link( 'http://twitter.com/' + username );
});
};
// Auto-link Twitter hashtags in a string
// Usage: mystring.parseHashtag()
String.prototype.parseHashtag = function() {
return this.replace(/[#]+[A-Za-z0-9-_]+/g, function( t ) {
var tag = t.replace("#","%23");
return t.link( 'http://search.twitter.com/search?q=' + tag );
});
};
Upvotes: 0
Reputation: 7706
You need to consider where a hashtag might appear. There are three cases:
So this will match them correctly:
'/(^|\s)\#\w+/'
Explanation:
^
can be used in OR statements\s
is used to catch spaces, tabs and new linesHere is the complete code:
<?php
$subject = "#hashtag This is a simple #hashtag hello world #hastag2 last string not-a-hash-tag#hashtag3 and yet not -#hashtag";
$pattern = "/(?:^|\s)(\#\w+)/";
preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>
Upvotes: 1
Reputation: 16828
This works for me:
$subject = "This is a simple #hashtag hello world #hastag2 last string #hashtag3";
$pattern = "/(#\w+)/";
preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
Upvotes: 0