Reputation: 129
I am scraping comments from Reddit and trying to remove empty rows/comments.
A number of rows appear empty, though I cannot seem to remove them. When I use is_empty they do not appear empty.
> Reddit[25,]
[1] ""
> is_empty(Reddit$text[25])
[1] FALSE
> Reddit <- subset(Reddit, text != "")
> Reddit[25,]
[1] ""
Am I missing something? I've tried a couple of other methods to remove these rows and they haven't worked either.
Edit: Included dput example in answer to comments:
RedditSample <- data.frame(text=
c("I liked coinbase, used it before. But the fees are simply too much. If they were to take 1% instead 2.5% I would understand. It's much simpler and long term it doesn't matter as much.",
"But Binance only charges 0.1% so making the switch is worth it fairly quickly. They also have many more coins. Approval process took me less than 10 minutes, but always depends on how many register at the same time.",
"", "Here's a 10%/10% referal code if you chose to register: KHELMJ94",
"What is a spot wallet?"))
Upvotes: 0
Views: 121
Reputation: 1969
You could use the string length functions. For example in tidyverse
which includes the stringr
package:
library(tidyverse)
Reddit %>%
filter(str_length(text) > 0)
Or base R:
Reddit[ nchar(Reddit$text) >0, ]
Upvotes: 0
Reputation: 206197
Actually the data you shared doesn't contain an empty string, it contains a Unicode zero-width space character. You can see that with
charToRaw(RedditSample$text[3])
# [1] e2 80 8b
You could make sure there is a non-space character using a regular expression that matches a "word" character
subset(RedditSample, grepl("\\w", text))
Upvotes: 2