Reputation: 129
I'm trying to run the following and gives me an error message.
data <- c("Who said we cant have a lil dance party while were stuck in Quarantine? Happy Friday Cousins!! We got through another week of Quarantine. Lets continue to stay safe, healthy and make the best of the situation. . . Video: . . - #blackgirlstraveltoo #everydayafrica #travelnoire #blacktraveljourney #essencetravels #africanculture #blacktravelfeed #blacktravel #melanintravel #ethiopia #representationmatters #blackcommunity #Moyoafrika #browngirlbloggers #travelafrica #blackgirlskillingit #passportstamps #blacktravelista #blackisbeautiful #weworktotravel #blackgirlsrock #mytravelcrush #blackandabroad #blackgirlstravel #blacktravel #africanamerican #africangirlskillingit #africanmusic #blacktravelmovement #blacktravelgram",
"#Copingwiththelockdown... Festac town, Lagos. #covid19 #streetphotography #urbanphotography #copingwiththelockdown #documentaryphotography #hustlingandbustling #cityscape #coronavirus #busyroad #everydaypeople #everydaylife #commute #lagosroad #lagosmycity #nigeria #africa #westafrica #lagos #hustle #people #strength #faith #nopoverty #everydayeverywhere #everydayafrica #everydaylagos #nohunger #chroniclesofonyinye",
"Peace Everywhere. Amani Kila Pahali. Photo by Adan Galma . * * * * * * #matharestories #mathare #adangalma #everydaymathare #everydayeverywhere #everydayafrica #peace #amani #knowmathare #streets #spi_street #mathareslums")
data_df <- as.data.frame(data)
remove_reg <- "&|<|>"
tidy_data <- data_df %>%
mutate(text = str_remove_all(text, remove_reg)) %>%
unnest_tokens(word, text, token = "data_df") %>%
filter(!word %in% stop_words$word,
!word %in% str_remove_all(stop_words$word, "'"),
str_detect(word, "[a-z]"))
It gives me the following error message:
Error in stri_replace_all_regex(string, pattern, fix_replacement(replacement), : argument
str
should be a character vector (or an object coercible to)"
How can I fix it?
Upvotes: 0
Views: 418
Reputation: 11643
The main problem is that you gave your text column the name data
but then referred to it later as text
. Try it something more like this:
library(tidyverse)
library(tidytext)
text <- c("Who said we cant have a lil dance party while were stuck in Quarantine? Happy Friday Cousins!! We got through another week of Quarantine. Lets continue to stay safe, healthy and make the best of the situation. . . Video: . . - #blackgirlstraveltoo #everydayafrica #travelnoire #blacktraveljourney #essencetravels #africanculture #blacktravelfeed #blacktravel #melanintravel #ethiopia #representationmatters #blackcommunity #Moyoafrika #browngirlbloggers #travelafrica #blackgirlskillingit #passportstamps #blacktravelista #blackisbeautiful #weworktotravel #blackgirlsrock #mytravelcrush #blackandabroad #blackgirlstravel #blacktravel #africanamerican #africangirlskillingit #africanmusic #blacktravelmovement #blacktravelgram",
"#Copingwiththelockdown... Festac town, Lagos. #covid19 #streetphotography #urbanphotography #copingwiththelockdown #documentaryphotography #hustlingandbustling #cityscape #coronavirus #busyroad #everydaypeople #everydaylife #commute #lagosroad #lagosmycity #nigeria #africa #westafrica #lagos #hustle #people #strength #faith #nopoverty #everydayeverywhere #everydayafrica #everydaylagos #nohunger #chroniclesofonyinye",
"Peace Everywhere. Amani Kila Pahali. Photo by Adan Galma . * * * * * * #matharestories #mathare #adangalma #everydaymathare #everydayeverywhere #everydayafrica #peace #amani #knowmathare #streets #spi_street #mathareslums")
data_df <- tibble(text)
remove_reg <- "&|<|>"
data_df %>%
mutate(text = str_remove_all(text, remove_reg)) %>%
unnest_tokens(word, text) %>%
anti_join(get_stopwords()) %>%
filter(str_detect(word, "[a-z]"))
#> Joining, by = "word"
#> # A tibble: 105 x 1
#> word
#> <chr>
#> 1 said
#> 2 cant
#> 3 lil
#> 4 dance
#> 5 party
#> 6 stuck
#> 7 quarantine
#> 8 happy
#> 9 friday
#> 10 cousins
#> # … with 95 more rows
If you are specifically interested in Twitter data, consider using token = "tweets"
:
data_df %>%
unnest_tokens(word, text, token = "tweets")
#> Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.
#> # A tibble: 121 x 1
#> word
#> <chr>
#> 1 who
#> 2 said
#> 3 we
#> 4 cant
#> 5 have
#> 6 a
#> 7 lil
#> 8 dance
#> 9 party
#> 10 while
#> # … with 111 more rows
Created on 2020-04-12 by the reprex package (v0.3.0)
This option handles hashtags and usernames well.
Upvotes: 1