Akfak
Akfak

Reputation: 1

How to load texts for text mining with R Tidytext?

How do I load a folder of .txt files for textmining with Tidytext?

I came across Silge & Robinson "Text mining with R: A tidy approach" (https://www.tidytextmining.com/) and it seems very promising for my purposes. But I'm very new to R (trying to learn it for this very purpose) so I'm stumbling on some pretty basic problems.

While I can follow and reproduce the examples, they mostly start with importing existing libraries (e.g. janeaustenr or gutenbergr), whereas what I have is a folder of 30 txt files (each containing an annual declaration by the Swedish foreign minister to parliament).

I've sort of managed to do it backwards by using some other tutorials and the tm package to first create a corpus, then a DTM which I can then turn into a tidy data frame, but I guess there must be a simpler way, to go directly from a folder of txt files to a tidy data frame.

Upvotes: 0

Views: 2366

Answers (1)

Julia Silge
Julia Silge

Reputation: 11613

If you have a folder with .txt files in it, you can read them into a data frame called tbl that has a single column called text with code like this:

library(tidyverse)

tbl <- list.files(pattern = "*.txt") %>% 
        map_chr(~ read_file(.)) %>% 
        data_frame(text = .)

This uses a function from base R to find the files (list.files()) and a function from purrr to iterate over all the files. Check out a related question here.

After that, you can move on to other analytical tasks.

Upvotes: 3

Related Questions