Reputation: 151
I have a list of Twitter IDs that used a specific hashtag and now I'm trying to make a network graph to see who they follow. With the brand new rtweet package, the idea is that for each user_id
I use get_friends
function and end up with two-column table - userids | following.
The problem is that instead of two columns, I end up with just one. Here's what I'm doing based on similar questions:
#this is where the ids list comes from
head(ids)
user_id freq
2953382183 291
2832407758 178
522476436 149
773707421579677696 117
1296286704 113
773555423970529280 113
#for each user_id, get_friends() show me who the user is following
userids <- ids[1,1]
following <- get_friends(userids)
head(following)
ids
540219772
757699150507020288
2392165598
628569910
576547113
181996651
#NOW I'LL TRY TO FILL A NEW DATA FRAME FOR EACH "user_id" WITH ALL FOLLOWING "ids"
#initializing an empty data frame
final <- data.frame(userids = character(), following =character())
totalusers <- nrow(ids) #ids is a data frame where I got all `user_id`
userids <- NULL
following <- NULL
df <- NULL
for (i in 1:totalusers)
{
userids[i] <- ids[i,1]
following <- get_friends(userids[i]) #get_friends returns a data frame, by package default
df[i] <- data.frame(userids[i], following)
final <- rbind(final, df[i])
}
Does anyone know how I append following variable to this data frame? Many thanks.
Upvotes: 1
Views: 1093
Reputation: 1797
For a given set of id's (ids
) you can do the following:
library(rtweet)
library(plyr)
ids<-c("156562085","808676983","847366544183050240")#the users id
list_of_friends<-lapply(ids,get_friends)#get all the friends' ids per each user id
names(list_of_friends)<-ids
list_of_friends2<-lapply(list_of_friends,function(y) dim(y)[1])#get the number of friends
df1<-ldply(list_of_friends2, data.frame)#transform the data into data.frame
names(df1)<-c("user_id","following")
df1
yields:
user_id following
1 156562085 339
2 808676983 1066
3 847366544183050240 0
Additionally in order to produce the edge list:
f1<-function(x){
return(cbind(rep(names(list_of_friends[x]),dim(list_of_friends[[x]])
[1]),list_of_friends[[x]]))
}
l1<-lapply(names(list_of_friends),f1)
df2<-ldply(l1,data.frame)
names(df2)<-c("user_id","friend_id")
yielding df2
:
user_id friend_id
1 156562085 26787673
2 156562085 18139619
3 156562085 23827692
[...]
1403 808676983 19397785
1404 808676983 50393960
1405 808676983 113419517
If you add the column values from following
in df1
you get 1405, agreeing to nrow(df2)
. I believe df2
is what you wanted at a first place.
Upvotes: 0
Reputation: 151
The following piece of code works, although maybe It's not the most efficient way for large datasets.
for (i in 1:totalusers)
{
userids[i] <- ids[i,1]
following <- get_friends(userids[i])
final <- rbind(final, data.frame(userids=userids[i], following=following))
}
I ended up with this:
userids ids
2953382183 540219772
2953382183 757699150507020288
2953382183 2392165598
2953382183 628569910
2953382183 576547113
2953382183 181996651
Upvotes: 1