manoj rasika
manoj rasika

Reputation: 65

How to Remove characters that doesn't match the string pattern from a column of a data frame

I have a column in my data frame as shown below.

enter image description here

I want to keep the data in the pattern "\\d+Zimmer" and remove all the digits from the column such as "9586" and "927" in the picture. I tried following gsub function.

gsub("[^\\d+Zimmer]", "", flat_cl_one$rooms) 

But it removes all the digits, as below.

enter image description here

What Regex can I use to get the correct result? Thank You in Advance

Upvotes: 0

Views: 1217

Answers (4)

The fourth bird
The fourth bird

Reputation: 163217

This pattern [^\\d+Zimmer] matches any character except a digit or the following characters + Z i m etc...

Using gsub, you can check if the string does not start with the pattern ^\\d+Zimmer using a negative lookahead (?! setting perl = TRUE and then match 1 or more digits if the assertion it true.

gsub("^(?!^\\d+Zimmer\\b)\\d+\\b", "", flat_cl_one$rooms, perl = TRUE)

See an R demo.

Upvotes: 1

PaulS
PaulS

Reputation: 25323

Another possible solution, using stringr::str_extract (I am using @AndrewGillreath-Brown's data, to whom I thank):

library(tidyverse)

df <- structure(
  list(rooms = c("647Zimmer", "394Zimmer", "8796", "9389", "38210Zimmer")),
  class = "data.frame", 
  row.names = c(NA, -5L))

df %>% 
  mutate(rooms = str_extract(rooms, "\\d+Zimmer"))

#>         rooms
#> 1   647Zimmer
#> 2   394Zimmer
#> 3        <NA>
#> 4        <NA>
#> 5 38210Zimmer

Upvotes: 1

Allan Cameron
Allan Cameron

Reputation: 173793

Just replace strings that don't contain the word "Zimmer"

flat_cl_one$room[!grepl("Zimmer", flat_cl_one$room)] <- ""

flat_cl_one
#>       room
#> 1  3Zimmer
#> 2  2Zimmer
#> 3  2Zimmer
#> 4  3Zimmer
#> 5         
#> 6         
#> 7  3Zimmer
#> 8  6Zimmer
#> 9  2Zimmer
#> 10 4Zimmer

Data

flat_cl_one <- data.frame(room = c("3Zimmer", "2Zimmer", "2Zimmer", "3Zimmer", 
                                   "9586", "927", "3Zimmer", "6Zimmer", 
                                   "2Zimmer", "4Zimmer"))

Upvotes: 1

AndrewGB
AndrewGB

Reputation: 16836

We can coerce any rows that have alphanumeric characters to NA and then replace the rows that don't have NA to blanks.

library(dplyr)

flat_cl_one %>% 
  mutate(rooms = ifelse(!is.na(as.numeric(rooms)), "", rooms))

Or we can use str_detect:

flat_cl_one %>% 
  mutate(rooms = ifelse(str_detect(rooms, "Zimmer", negate = TRUE), "", rooms))

Output

        rooms
1   647Zimmer
2   394Zimmer
3            
4            
5 38210Zimmer

We could do the same thing with filter if you wanted to actually remove those rows.

flat_cl_one %>% 
  filter(is.na(as.numeric(rooms)))

#        rooms
#1   647Zimmer
#2   394Zimmer
#3 38210Zimmer

Data

flat_cl_one <- structure(list(rooms = c("647Zimmer", "394Zimmer", "8796", "9389", 
"38210Zimmer")), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 2

Related Questions