Elizabeth Marie
Elizabeth Marie

Reputation: 23

Using str_match on a field in data frame from rtweet

I tried using str_match from stringr and it works on a simple test example. But it doesn't work on data that comes back from rtweet:

Here is a made-up data frame it does seem to work on:

test <- data.frame(c(1), c('something'))
names(test) <- c('value', 'item')

subset(test, !anyNA(str_match(item,'thing')))

That gives a match and doesn't filter the item out, producing:

  value      item
1     1 something

Changing it to something else:

subset(test, !anyNA(str_match(item,'thang')))

...filters the item out, as expected:

[1] value item 
<0 rows> (or 0-length row.names)

But the "mentions_screen_name" field in the data frame that comes back from rtweet doesn't seem to be able to be subsetted like this. Other logic operations (like mentions_screen_name == ...) work for picking a column out. But !anyNA(str_match(mentions_screen_name, '...')) won't work, even if you match on the exact text of the field.

I'd like to send the data which doesn't seem to be able to be selected by str_match. But the script that gets the data is using rtweet and needs Twitter apps credentials.

The simple example like I said works, though. Is there something different about the rtweet data?

Upvotes: 2

Views: 679

Answers (1)

mkearney
mkearney

Reputation: 1345

Data structure for mentions, hashtags, etc is currently a character vector created by collapsing the values for each tweet into a single string separated by commas. The next version of rtweet will include some utility functions to make handling this structure a little easier. In the meantime, you can either convert mentions into a list object:

strsplit(mentions, ",")

Or use functions like grep/grepl to search for matches within strings.

I'm not totally sure this answers your question, so if I misunderstood you I'm sorry.

Upvotes: 1

Related Questions