Rohan Sagar
Rohan Sagar

Reputation: 31

I cannot get past data(stop_words) to analyze text in text mining

It's my first attempt at text mining and I have run into a wall. This is what I have done thus far:

library(tm)
library(tidytext)
library(dplyr)
library(ggplot2)

text1 <- c("Dear land of Guyana, of rivers and plains,
Made rich by the sunshine, and lush by the rains,
Set gem-like and fair between mounts and sea-
Your children salute you. dear land of the free.
Green land of Guyana, our heroes of yore,
Both bondsman and free, laid their bones on your shore,
This soil so they hallowed, and from them are we,
All sons of one mother, Guyana the free
Great land of Guyana, diverse though our strains,
We are born of their sacrifice, heirs of their pains,
And ours is the glory their eyes did not see –
One Land of six peoples, united and free.
Dear Land of Guyana, to you will we give
Our homage, our service each day that we live;
God guard you, great Mother, and make us to be
More worthy our heritage – land of the free.")

text1 
newtext1 <- data_frame(line = 1:16, text = text1)
newtext1

newtext1 %>%
  unnest_tokens(word, text)

data(stop_words)

newtext1 <- newtext1 %>%
  anti_join(newtext1)

newtext1 %>%
  count(newtext1, sort = TRUE)

I have not been able to move forward from data(stop_words). Thanks in advance.

Rohan

Upvotes: 1

Views: 52

Answers (1)

Carl
Carl

Reputation: 7540

You could use read_lines to put each line into a separate row in the data frame (rather than repeating the whole text in each row). Make sure to save the unnested tokens before trying to anti-join to the stopwords.

library(tidyverse)
library(tidytext)

text1 <- c("Dear land of Guyana, of rivers and plains,
Made rich by the sunshine, and lush by the rains,
Set gem-like and fair between mounts and sea-
Your children salute you. dear land of the free.
Green land of Guyana, our heroes of yore,
Both bondsman and free, laid their bones on your shore,
This soil so they hallowed, and from them are we,
All sons of one mother, Guyana the free
Great land of Guyana, diverse though our strains,
We are born of their sacrifice, heirs of their pains,
And ours is the glory their eyes did not see –
One Land of six peoples, united and free.
Dear Land of Guyana, to you will we give
Our homage, our service each day that we live;
God guard you, great Mother, and make us to be
More worthy our heritage – land of the free.")

new_text <- read_lines(text1) %>% 
  as_tibble() %>% 
  unnest_tokens(word, value) %>% 
  anti_join(stop_words)
#> Joining with `by = join_by(word)`

new_text %>% 
  count(word, sort = TRUE)
#> # A tibble: 46 × 2
#>    word         n
#>    <chr>    <int>
#>  1 land         7
#>  2 free         5
#>  3 guyana       5
#>  4 dear         3
#>  5 mother       2
#>  6 bondsman     1
#>  7 bones        1
#>  8 born         1
#>  9 children     1
#> 10 day          1
#> # ℹ 36 more rows

Created on 2024-04-14 with reprex v2.1.0

Upvotes: 0

Related Questions