Reputation:
Using the readLines() function, I have imported a txt file, which stored sentences within multiple paragraphs like this:
sentence1. sentence2. sentence3.
sentence4. sentence5.
sentence6. sentence7.
For further analysis I would like to apply the sentiment_by() function on my imported txt file. When I do so, I receive sentiment values for each paragraph rather than the whole txt file itself. Therefore I want to remove the paragraphs within the txt file so that I receive only one sentiment coefficient. To do so I would need to transform my txt file so that the text looks like this:
sentence1. sentence2. sentence3. sentence4. sentence5. sentence6. sentence7.
If I were to run the sentiment_by() function on this piece of text it would yield one coefficient for the whole text. Is there a way I can transform the text by removing the paragraphs in R before I carry on with the analysis?
Upvotes: 0
Views: 521
Reputation: 324
If each paragraph you grab is a character vector you can strip tabs and newlines away (and other whitespace characters if needed).
trimmed_text = trimws(text_var, which = "both", whitespace = "[\t\r\n]")
There are other things you can tweak as shown here.
Upvotes: 0