coder101
coder101

Reputation: 1605

regex for extracting hash tags without spaces

I'm using this:

$t = "#hashtag #goodhash_tag united states #l33t this";
$queryVariable = "";
if(preg_match_all('/(^|\s)(#\w+)/', $t, $arrHashTags) > 0){
    array_filter($arrHashTags);
    array_unique($arrHashTags);
    $count = count($arrHashTags[2]);
    if($count > 1){
        $counter = 1;
        foreach ($arrHashTags[2] as $strHashTag) {
            if (preg_match('/#\d*[a-z_]+/i', $strHashTag)) {
                if($counter == $count){
                    $queryVariable .= $strHashTag;              
                } else{
                    $queryVariable .= $strHashTag." and ";
                }
                $newTest = str_replace($arrHashTags[2],"", $t);                 
            }
            $counter = $counter + 1;
        }
    }
}
echo $queryVariable."<br>"; // this is list of tags
echo $newTest;   // this is the remaining text

The output based on the $t above is:

#hashtag and #goodhash_tag and #l33t
united states this

First problem:

if $t = '#hashtag#goodhash_tag united states #l33t this'; i.e without space between two tags, the output becomes:

#hashtag and #l33t
#goodhash_tag united states this

Second problem:

if $t = '#hashtag #goodhash_tag united states #l33t this #123'; i.e with an invalid tag #123 it somehow disturbs my list of tags extracted in $queryVariable like the output becomes

#hashtag and #goodhash_tag and #l33t and // note the extra 'and'
united states this

Please help on these two if anyone?

Upvotes: 3

Views: 769

Answers (1)

hjpotter92
hjpotter92

Reputation: 80649

Instead of using so many comparisions etc. for your regex. You can simply have the following:

$t = "#hashtag #goodhash_tag united states #l33t this #123#tte#anothertag sth";
$queryVariable = "";
preg_match_all('/(#[A-z_]\w+)/', $t, $arrHashTags);
print_r( $arrHashTags[1] );

To get them as string with and joining them, you can use implode.

$queryVariable = implode( $arrHashTags[1], " and " );

For the remaining text, you can have preg_replace or str_replace(whichever you are comfortable with).


Here is the codepad link.

Upvotes: 5

Related Questions