Reputation: 23
I tried using str_match
from stringr
and it works on a simple test example. But it doesn't work on data that comes back from rtweet:
Here is a made-up data frame it does seem to work on:
test <- data.frame(c(1), c('something'))
names(test) <- c('value', 'item')
subset(test, !anyNA(str_match(item,'thing')))
That gives a match and doesn't filter the item out, producing:
value item
1 1 something
Changing it to something else:
subset(test, !anyNA(str_match(item,'thang')))
...filters the item out, as expected:
[1] value item
<0 rows> (or 0-length row.names)
But the "mentions_screen_name" field in the data frame that comes back from rtweet doesn't seem to be able to be subsetted like this. Other logic operations (like mentions_screen_name == ...
) work for picking a column out. But !anyNA(str_match(mentions_screen_name, '...'))
won't work, even if you match on the exact text of the field.
I'd like to send the data which doesn't seem to be able to be selected by str_match. But the script that gets the data is using rtweet and needs Twitter apps credentials.
The simple example like I said works, though. Is there something different about the rtweet data?
Upvotes: 2
Views: 679
Reputation: 1345
Data structure for mentions, hashtags, etc is currently a character vector created by collapsing the values for each tweet into a single string separated by commas. The next version of rtweet will include some utility functions to make handling this structure a little easier. In the meantime, you can either convert mentions into a list object:
strsplit(mentions, ",")
Or use functions like grep/grepl to search for matches within strings.
I'm not totally sure this answers your question, so if I misunderstood you I'm sorry.
Upvotes: 1