Read in large text file (100 MB)

Question

I am working on text mining project with R. The file size is over 100 MB. I managed to read the file and did some text processing, however, when I get to the point of removing stop words, RStudio crushed. What would be the best solution, please?

Should I split the file into 2 or 3 files, process them and then merge them again before applying any analytics? anyone has the code to split. I tried several options available online and none of them seems to work.

Here is the code I used. Everything worked smoothly except the removing stop words

# Install
install.packages("tm")  # for text mining
install.packages("SnowballC") # for text stemming
install.packages("wordcloud") # word-cloud generator 
install.packages("RColorBrewer") # color palettes

# Load
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")

library(readr)
doc <- read_csv(file.choose())

docs <- Corpus(VectorSource(doc))
docs

# Convert the text to lower case
docs <- tm_map(docs, content_transformer(tolower))

# Remove numbers
docs <- tm_map(docs, removeNumbers)

# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))

Read in large text file (100 MB)

Answers (1)

Related Questions