Reputation: 1
How do I load a folder of .txt files for textmining with Tidytext?
I came across Silge & Robinson "Text mining with R: A tidy approach" (https://www.tidytextmining.com/) and it seems very promising for my purposes. But I'm very new to R (trying to learn it for this very purpose) so I'm stumbling on some pretty basic problems.
While I can follow and reproduce the examples, they mostly start with importing existing libraries (e.g. janeaustenr or gutenbergr), whereas what I have is a folder of 30 txt files (each containing an annual declaration by the Swedish foreign minister to parliament).
I've sort of managed to do it backwards by using some other tutorials and the tm package to first create a corpus, then a DTM which I can then turn into a tidy data frame, but I guess there must be a simpler way, to go directly from a folder of txt files to a tidy data frame.
Upvotes: 0
Views: 2366
Reputation: 11613
If you have a folder with .txt
files in it, you can read them into a data frame called tbl
that has a single column called text
with code like this:
library(tidyverse)
tbl <- list.files(pattern = "*.txt") %>%
map_chr(~ read_file(.)) %>%
data_frame(text = .)
This uses a function from base R to find the files (list.files()
) and a function from purrr to iterate over all the files. Check out a related question here.
After that, you can move on to other analytical tasks.
Upvotes: 3