Create tm corpus including text (tweet) attributes from dataframe

Question

I have a data frame including tweets, creation date, tweet ids, favorite and retweet counts. I want to create a corpus that includes for each document the favorite and retweet counts as variables. I also want to identify the documents by the tweet id, not by the random doc 001 etc ids.

I start with the data below... See below for rest of code

                   id
1: 737243856144629760
2: 737242308261842945
3: 737242189055594496
4: 737242018687164416
5: 737241411465170944
6: 737239685295181824
                                                                                                                                    text
1:                                                    Have a great Memorial Day and remember that we will soon MAKE AMERICA GREAT AGAIN!
2:                 "@NBCDFW: Trump rallies veterans at annual Rolling Thunder Gathering https://twitter.com/b08FcMlgkr https://twitter.com/RCDeLvHQqD"
3:                "@FrankyLamouche: how many of donald's rolling thunder brigade will sign up and go to war for him in the middle east."
4:    "@MariaErnandez3b: Trump Supports Rolling Thunder Rally #TRUMP STRONG https://twitter.com/pfVXQ8NdZu" So true, and remember the M.I.A.'s!
5:     "@ScottWRasmussen: Donald Trump and Bikers Share Affection at Rolling Thunder Rally https://twitter.com/ZZl2sc29dn" A great day in D.C.!
6: "@TeaPartyNevada: #Trump2016 "Illegals are taken care of better than our veterans."  https://twitter.com/KKIgM4rNma https://twitter.com/1cEZ8wG7Cy"
   favorited favoritwitter.comunt replyToSN             created truncated replyToSID replyToUID
1:     FALSE         25944        NA 2016-05-30 11:26:47     FALSE         NA         NA
2:     FALSE          9268        NA 2016-05-30 11:20:38     FALSE         NA         NA
3:     FALSE          6739        NA 2016-05-30 11:20:09     FALSE         NA         NA
4:     FALSE         15417        NA 2016-05-30 11:19:29     FALSE         NA         NA
5:     FALSE          7192        NA 2016-05-30 11:17:04     FALSE         NA         NA
6:     FALSE          9834        NA 2016-05-30 11:10:12     FALSE         NA         NA
                                                                           statusSource      screenName retweetCount
1: Twitter for Android realDonaldTrump         9455
2: Twitter for Android realDonaldTrump         2744
3: Twitter for Android realDonaldTrump         1604
4: Twitter for Android realDonaldTrump         4237
5: Twitter for Android realDonaldTrump         2148
6: Twitter for Android realDonaldTrump         3545
   isRetweet retweeted longitude latitude
1:     FALSE     FALSE        NA       NA
2:     FALSE     FALSE        NA       NA
3:     FALSE     FALSE        NA       NA
4:     FALSE     FALSE        NA       NA
5:     FALSE     FALSE        NA       NA
6:     FALSE     FALSE        NA       NA
                                                                                                                                cleantxt
1:                                                    have a great memorial day and remember that we will soon make america great again!
2:                 "@nbcdfw: trump rallies veterans at annual rolling thunder gathering https://twitter.com/b08fcmlgkr https://twitter.com/rcdelvhqqd"
3:                "@frankylamouche: how many of donald's rolling thunder brigade will sign up and go to war for him in the middle east."
4:    "@mariaernandez3b: trump supports rolling thunder rally #trump strong https://twitter.com/pfvxq8ndzu" so true, and remember the m.i.a.'s!
5:     "@scottwrasmussen: donald trump and bikers share affection at rolling thunder rally https://twitter.com/zzl2sc29dn" a great day in d.c.!
6: "@teapartynevada: #trump2016 "illegals are taken care of better than our veterans."  https://twitter.com/kkigm4rnma https://twitter.com/1cez8wg7cy"

I try to convert it to a corpus with

myReader <- readTabular(mapping=list(content="cleantxt", id="id", created="created", retweet="retweetCount", fav="favoriteCount"))
trumptweetsenhanced <- VCorpus(DataframeSource(trumptweets.df), readerControl=list(reader=myReader))

However, when I convert the corpus back to a data frame, there are no added variables

> head(trumptweetsenhanced_dataframe.df)
      docs                                                                            text
1 doc 0001                            great memori day rememb will soon make america great
2 doc 0002                           nbcdfw trump ralli veteran annual roll thunder gather
3 doc 0003       frankylamouch mani donald roll thunder brigad will sign go war middl east
4 doc 0004     mariaernandezb trump support roll thunder ralli trump strong true rememb ms
5 doc 0005 scottwrasmussen donald trump biker share affect roll thunder ralli great day dc
6 doc 0006                            teapartynevada trump illeg taken care better veteran

Create tm corpus including text (tweet) attributes from dataframe

Answers (1)

Related Questions