Reputation: 1767
I am using the twitter API to fill my database with tweets. I now want to be able to search for words in the tweets. I have realized that this is not as simple as using LIKE because LIKE will do the following:
If searching for 'pan' like would return tweets that have words that contain that string such as 'companion'.
I have decided I need to use regular expressions to solve this. Basically I want to be able to find only full words (not if they are contained in other words like the above) in the tweets I am searching. So how would I be able to do this?
The following is what I have so far:
SELECT tweet_id, text FROM tweets WHERE text REGEXP ''
I am just unsure about what the regular expression should contain.
Upvotes: 2
Views: 4451
Reputation: 191749
You can use [[:<:]]
and [[:>:]]
as word boundaries that match the beginning and end of words, respectively: http://dev.mysql.com/doc/refman/5.1/en/regexp.html
SELECT tweet_id, text FROM tweets WHERE text REGEXP '[[:<:]]pan[[:>:]]'
Upvotes: 9
Reputation: 7025
If you go to http://gskinner.com/RegExr/ you can enter in some test data and play around with your regex strings until you get one that works just right.
As a starting point, you probably want something like '.* pan .*'
.
= a wildcard character
.*
= matches 1 or more of the wildcard character
This will work for most examples of your search but you will need to play with regexr to get one that works for all possibilities
Upvotes: 0