Reputation: 67
I have a dataset that contains 150k rows and 40 columns. The dataset consists of characters, strings as well as integers. I want to remove particular cells which contains a specific keyword/phrase that I want to remove. For example, If the initial dataset looks like
A B
1) 1 | My name is Sam.
2) Hello | 2
3) Who are you | The water is green.
I want to remove the cell that contains the keyword "is". So that the final dataset will look like:-
A B
1) 1 | NA
2) Hello | 2
3) Who are you | NA
The keyword can be in any format like a combination of strings or numbers or both.
Upvotes: 1
Views: 977
Reputation: 887851
We can assign it to NA
by creating a logical index with grep
df1$B[grepl("\\bis\\b", df1$B)] <- NA
If is
can be also a substring, then remove the word boundary
df1$B[grepl("is", df1$B)] <- NA
It is also possible that the OP wants to assign all non-numeric elements to NA. In that case, just converting the column to numeric
(assuming it is character
class)
df1$B <- as.numeric(df1$B)
If there are multiple columns, then we can use lapply
colsofInterest <- 2:ncol(df1)
df1[colsofInterest] <- lapply(df1[colsofInterest],
function(x) replace(x, grepl("\\bis\\b", x), NA))
Or this can be also done with tidyverse
library(tdivyerse)
df1 %>%
mutate_at(vars(colsofInterest), funs(replace(., str_detect(., "\\bis\\b"), NA)))
Upvotes: 2
Reputation: 4169
Using grepl you can replace the entries in a single column that contain the word "is" with NA as Akrun has shown. To make that occur over all columns, you can convert the dataframe to a vector first, subbing out the strings with "is", and then rebuilding the dataframe.
k <- 150
n <- 1000 * k
df1 <- data.frame(matrix(sample(c("Hi, my name is Khal Drogo", "Hello Khal Drogo", 1, 2), replace = T, size = n), ncol = k))
dfvec <- as.vector(unlist(df1))
dfvec[grepl("\\bis\\b", dfvec)] <- NA
df1 <- data.frame(matrix(dfvec, ncol = k))
head(df1[,1:5])
Which gives this:
> head(df1[,1:5])
X1 X2 X3 X4 X5
1 Hello Khal Drogo <NA> 1 2 1
2 Hello Khal Drogo 2 <NA> 1 Hello Khal Drogo
3 Hello Khal Drogo Hello Khal Drogo 1 1 <NA>
4 <NA> 1 Hello Khal Drogo 2 2
5 2 Hello Khal Drogo <NA> 2 1
6 2 Hello Khal Drogo Hello Khal Drogo Hello Khal Drogo 2
Upvotes: 2