Luiz Felipe Freitas
Luiz Felipe Freitas

Reputation: 151

Loop through Twitter followers with rtweet package in R

I have a list of Twitter IDs that used a specific hashtag and now I'm trying to make a network graph to see who they follow. With the brand new rtweet package, the idea is that for each user_id I use get_friends function and end up with two-column table - userids | following.

The problem is that instead of two columns, I end up with just one. Here's what I'm doing based on similar questions:

#this is where the ids list comes from
head(ids)
user_id             freq
2953382183           291
2832407758           178
522476436            149
773707421579677696   117
1296286704           113
773555423970529280   113

#for each user_id, get_friends() show me who the user is following
userids <- ids[1,1]
following <- get_friends(userids)
head(following)
               ids
         540219772
757699150507020288
        2392165598
         628569910
         576547113
         181996651

#NOW I'LL TRY TO FILL A NEW DATA FRAME FOR EACH "user_id" WITH ALL FOLLOWING "ids"

#initializing an empty data frame
final <- data.frame(userids = character(), following =character())

totalusers <- nrow(ids) #ids is a data frame where I got all `user_id`
userids <- NULL
following <- NULL
df <- NULL

for (i in 1:totalusers)
{
userids[i] <- ids[i,1]
following <- get_friends(userids[i]) #get_friends returns a data frame, by package default
df[i] <- data.frame(userids[i], following)
final <- rbind(final, df[i])
}

Does anyone know how I append following variable to this data frame? Many thanks.

Upvotes: 1

Views: 1093

Answers (2)

amonk
amonk

Reputation: 1797

For a given set of id's (ids) you can do the following:

library(rtweet)
library(plyr)
ids<-c("156562085","808676983","847366544183050240")#the users id
list_of_friends<-lapply(ids,get_friends)#get all the friends' ids per each user id
names(list_of_friends)<-ids
list_of_friends2<-lapply(list_of_friends,function(y) dim(y)[1])#get the number of friends 
df1<-ldply(list_of_friends2, data.frame)#transform the data into data.frame
names(df1)<-c("user_id","following")

df1 yields:

             user_id         following
1           156562085           339
2           808676983          1066
3  847366544183050240             0

Additionally in order to produce the edge list:

f1<-function(x){
  return(cbind(rep(names(list_of_friends[x]),dim(list_of_friends[[x]])
[1]),list_of_friends[[x]]))
}
l1<-lapply(names(list_of_friends),f1)
df2<-ldply(l1,data.frame)
names(df2)<-c("user_id","friend_id")

yielding df2:

  user_id          friend_id
1    156562085           26787673
2    156562085           18139619
3    156562085           23827692
                [...]
1403 808676983           19397785
1404 808676983           50393960
1405 808676983           113419517

If you add the column values from following in df1 you get 1405, agreeing to nrow(df2). I believe df2 is what you wanted at a first place.

Upvotes: 0

Luiz Felipe Freitas
Luiz Felipe Freitas

Reputation: 151

The following piece of code works, although maybe It's not the most efficient way for large datasets.

for (i in 1:totalusers)
{
userids[i] <- ids[i,1]
following <- get_friends(userids[i])
final <- rbind(final, data.frame(userids=userids[i], following=following))
}

I ended up with this:

userids                    ids
2953382183           540219772
2953382183  757699150507020288
2953382183          2392165598
2953382183           628569910
2953382183           576547113
2953382183           181996651

Upvotes: 1

Related Questions