Extracting mentions, hashtags and, urls and placing them in a new column in a Twitter Dataset with R

Question

I have a Twitter dataset of 30000 tweets and I'm trying to prepare the data for text analysis. I downloaded the dataset with academictwitteR package in R. Inside the dataset, some columns (such as; "user.metrics", "public.metrics", "entities" are seperate data frames. I managed to extract the columns from "user.metrics" and "public.metrics" and merge the extracted columns with my original dataset as following, without a problem;

#extract
extract_publicmetrics <- as.data.frame(mytwitterdata$public_metrics)
colnames(extract_publicmetrics)
[1] "retweet_count" "reply_count"   "like_count"    "quote_count"

#add observation column to bind with the original data (mytwitterdata)
addconsecutivenumbers1 <- cbind(extract_publicmetrics, "observation"=1:nrow(deneme2_publicmetrics)) 
addconsecutivenumbers2 <- cbind(mytwitterdata, "observation"=1:nrow(joined_deneme2))
#merge two data
merged.data <- merge(addconsecutivenumbers1, addconsecutivenumbers2, by="observation")

But, I could not manage to extract "mentions", "urls", "hastags" columns from "Entities" dataframe in my dataset.I think it's because "mentions", "urls", "hashtags" are nested lists in that data frame (e.g.):

class(mytwitterdata$entities$hashtags)
[1] "list"

For example, a Tweet may contain no hashtag, one hashtag, or more than one hashtag. I want to create a new column from that list in which the value of the row is NA when there is no hashtag, or the row includes the hashtag as text in the row ( or hashtags separated with commas when it includes more than one hashtag).

Attached is s sample data of 10 rows extracted from the "Entities" dataframe from my dataset:

https://drive.google.com/file/d/1vfyFIObRS9tCxGNJCG9AMyKgxwgwBDMZ/view?usp=sharing

Extracting mentions, hashtags and, urls and placing them in a new column in a Twitter Dataset with R

Answers (1)

Related Questions