Reputation: 679
I am trying to filter a dataframe in R that I have created with reading an Excel CSV with:
data <- read.csv(file="C:/Users/gskyle/Desktop/Keyword Planner.csv",
stringsAsFactors = FALSE,
header = TRUE)
It looks like:
keyword
london venues to hire.
:
how to get a gig in london
0
buy a pub in london)
london blues bars909
jazz vortex!
london events tickets
happenings in london
I want to remove the rows that contain punctuation and numbers so using dplyr, I use:
require(dplyr)
filtered.data <- filter(data, !grepl('[:digit:]|[:punct:]', keyword))
However, my result is:
filtered.data
keyword
1 0
Th 4th row only remains and it is a digit. I have tried stating the encoding in the read.csv function as encoding = "ANSI"
but no luck. Can someone please help?
Upvotes: 0
Views: 219
Reputation: 93811
Here are two options:
library(dplyr)
data %>% filter(!grepl("[[:digit:]]|[[:punct:]]", keyword))
keyword
1 how to get a gig in london
2 london events tickets
3 happenings in london
Per @RichardScriven's comment, you can do this in base R as follows:
data[!grepl("[[:digit:]]|[[:punct:]]", data$keyword),]
But if you want to keep all the text while removing numbers and punctuation, you can do this:
data %>% mutate(keyword = gsub("[[:digit:]]|[[:punct:]]", "", keyword)) %>%
filter(keyword != "")
keyword
1 london venues to hire
2 how to get a gig in london
3 buy a pub in london
4 london blues bars
5 jazz vortex
6 london events tickets
7 happenings in london
Note that you need double braces, rather than single, and the class for digits is [[:digit:]]
, rather than [[:digits:]]
. Also, you can save some typing by using \\d
instead of [[:digit:]]
.
Upvotes: 2