GKyle
GKyle

Reputation: 679

Trying to filter a dataframe with regex created through the read.csv function in R

I am trying to filter a dataframe in R that I have created with reading an Excel CSV with:

    data <-  read.csv(file="C:/Users/gskyle/Desktop/Keyword Planner.csv",
             stringsAsFactors = FALSE,
             header = TRUE)

It looks like:

keyword
london venues to hire.
:
how to get a gig in london
0
buy a pub in london)
london blues bars909
jazz vortex!
london events tickets
happenings in london

I want to remove the rows that contain punctuation and numbers so using dplyr, I use:

require(dplyr)

filtered.data <- filter(data, !grepl('[:digit:]|[:punct:]', keyword))

However, my result is:

filtered.data

  keyword
1       0

Th 4th row only remains and it is a digit. I have tried stating the encoding in the read.csv function as encoding = "ANSI"but no luck. Can someone please help?

Upvotes: 0

Views: 219

Answers (1)

eipi10
eipi10

Reputation: 93811

Here are two options:

library(dplyr)

data %>% filter(!grepl("[[:digit:]]|[[:punct:]]", keyword))

                     keyword
1 how to get a gig in london
2      london events tickets
3       happenings in london

Per @RichardScriven's comment, you can do this in base R as follows:

data[!grepl("[[:digit:]]|[[:punct:]]", data$keyword),]

But if you want to keep all the text while removing numbers and punctuation, you can do this:

data %>% mutate(keyword = gsub("[[:digit:]]|[[:punct:]]", "", keyword)) %>%
  filter(keyword != "")

                     keyword
1      london venues to hire
2 how to get a gig in london
3        buy a pub in london
4          london blues bars
5                jazz vortex
6      london events tickets
7       happenings in london

Note that you need double braces, rather than single, and the class for digits is [[:digit:]], rather than [[:digits:]]. Also, you can save some typing by using \\d instead of [[:digit:]].

Upvotes: 2

Related Questions