Reputation: 103
I have started using rtweet package and so far, I have had good results for my queries, languages and geocode parameters. However, I still do not know how can I collect twitter data from within the last 7 days.
For example in the next code chunk I want to extract some data for 7 days but I am not sure if the collected tweets will be since 2017-06-29 until 2017-06-05 or if they will be since 2017-06-22 until 2017-06-29:
Stream all tweets mentioning AMLO or lopezobrador for 7 days
stream_tweets("AMLO,lopezobrador",
timeout = 60*60*24*7,
file_name = "tweetsaboutAMLO.json",
parse = FALSE)
Read in the data as a tidy tbl data frame
AMLO <- parse_stream("tweetsaboutAMLO.json")
Do you know if there are any commands in rtweet to specify the time frame to use when using the search_tweets() or stream_tweets() functions?
Upvotes: 0
Views: 1879
Reputation: 5898
So, to answer your question about gow to write it more efficiently, you could try a for loop or a list apply. Here I show the for loop.
First, create a list with the 4 dates you are calling.
fechas <- seq.Date(from = as.Date("2018-06-24"), to = as.Date("2018-06-27"), by = 1)
Then create an empty data.frame to store your tweets.
df_tweets <- data.frame()
Now, loop along your list and populate the empty data.frame.
for (i in seq_along(fechas)) {
df_temp <- search_tweets("lang:es",
geocode = mexico_coord,
until= fechas[i],
n = 100)
df_tweets <- rbind(df_tweets, df_temp)
}
summary(df_tweets)
On the other hand, the following solution might be more convenient and efficient altogether:
library(tidyverse)
f_tweets2 <- search_tweets("lang:es",
geocode = mexico_coord,
until= "2018-06-29", ## or latest date
n = 10000)
df_tweets2 %>%
group_by(as.Date(created_at)) %>% ## Group (or set apart) the tweets by date of creation
sample_n(100) ## Obtain 100 random tweets for each group, in this case, for each date.
Upvotes: 1
Reputation: 103
I already found a wat to collect tweets within the past seven days. However, it is not efficient.
rt_24 <- search_tweets("lang:es",
geocode = mexico_coord,
until="2018-06-24",
n = 100)
rt_25 <- search_tweets("lang:es",
geocode = mexico_coord,
until="2018-06-25",
n = 100)
rt_26 <- search_tweets("lang:es",
geocode = mexico_coord,
until="2018-06-26",
n = 100)
rt_27 <- search_tweets("lang:es",
geocode = mexico_coord,
until="2018-06-27",
n = 100)
Then, append the dataframes
rbind(rt_24,rt_25,rt_26,rt_27)
Do you know if there is a more efficient way to write this? Maybe using the max_id() function in combination with until ?
Upvotes: 0