Harry
Harry

Reputation: 129

R empty "" rows cannot be removed

I am scraping comments from Reddit and trying to remove empty rows/comments.

A number of rows appear empty, though I cannot seem to remove them. When I use is_empty they do not appear empty.

> Reddit[25,]
[1] "​"

> is_empty(Reddit$text[25])
[1] FALSE

> Reddit <- subset(Reddit, text != "")
> Reddit[25,]
[1] "​"

Am I missing something? I've tried a couple of other methods to remove these rows and they haven't worked either.

Edit: Included dput example in answer to comments:

RedditSample <- data.frame(text=
c("I liked coinbase, used it before. But the fees are simply too much. If they were to take 1% instead 2.5% I would understand. It's much simpler and long term it doesn't matter as much.", 
"But Binance only charges 0.1% so making the switch is worth it fairly quickly. They also have many more coins. Approval process took me less than 10 minutes, but always depends on how many register at the same time.", 
"​", "Here's a 10%/10% referal code if you chose to register: KHELMJ94", 
"What is a spot wallet?"))

Upvotes: 0

Views: 121

Answers (2)

Jeff Parker
Jeff Parker

Reputation: 1969

You could use the string length functions. For example in tidyverse which includes the stringr package:

library(tidyverse)

Reddit %>%
    filter(str_length(text) > 0)

Or base R:

Reddit[ nchar(Reddit$text) >0, ]

Upvotes: 0

MrFlick
MrFlick

Reputation: 206197

Actually the data you shared doesn't contain an empty string, it contains a Unicode zero-width space character. You can see that with

charToRaw(RedditSample$text[3])
# [1] e2 80 8b

You could make sure there is a non-space character using a regular expression that matches a "word" character

subset(RedditSample, grepl("\\w", text))

Upvotes: 2

Related Questions