Reputation: 179
I'm having issues with puling data from clipboard that happens to have lots of punctuation (quotes, commas, etc) in it. I'm attempting to pull in the entirety of Jane Austen's Pride and Prejudice as a plain text document via copying to clipboard into a variable in R for analysis.
If I do a
book <- read.table("clipboard", sep="\n")
I get an "EOF within quoted string" error. If I put the option to not have strings as factors:
book <- read.table("clipboard", sep="\n", stringsAsFactors=F)
I get the same error. This affects the table by putting multiple paragraphs together where quotations are present. If I open the book in a text editor and remove the double quotes and single quotes, then try either read.table option, the result is perfect.
Is there a way to remove punctuation prior to (or during?) the read.table phase? Would I dump the clipboard data into some kind of big vector then read.table off that vector?
Upvotes: 0
Views: 128
Reputation: 49640
The read.table
function is intended to read in data in a rectangular structure and put it into a data frame. I don't expect that the text of a book would fit that pattern in general. I would suggest reading the data with the scan
or readLines
function in place of read.table
. Read the documentation for those functions on how to deal with quotes and separators.
If you still want to remove punctuation, then look at ?gsub
, if you also want to convert all the characters to upper or lower case see ?chartr
.
Upvotes: 0
Reputation: 2166
you need to disable quoting
this works for me
book <-read.table("http://www.gutenberg.org/cache/epub/1342/pg1342.txt",
sep="\n",quote="",stringsAsFactors=FALSE)
Upvotes: 1