Reputation: 642
I have data frame of several character and numerical variables. I want to set all values starting with 0/n(
to NA
. Theses values may be spread over several columns.
Example:
string <- c("asff", "1\n(", '0asfd', '0\n(asdf)')
num <- c(0,1,2,3)
df <- data.frame(string, num)
##
df
string num
1 asff 0
2 1\n( 1
3 0asfd 2
4 0\n(asdf) 3
Desired output:
string num
1 asff NA
2 1\n( 1
3 NA 2
4 NA 3
The numerical variables are quickly set to NA by:
df%>%na_if(., 0)
But the character string doesn't seem to work as easy with na_if()
, as na_if()
does not seem to work with regex, does it? I tried things like:
na_if(., "^0.")
na_if(., "0.")
na_if(., startsWith("0"))
, but ERROR ...
Maybe a conditionaldplyr::mutate(.=ifelse())
with a regex, but then I have these line breaks "\n)"
, which or not captured by "."
in a regex. Or something with mutate(x = str_replace_all())
.
I can do it column by column (df%>%mutate(x = ifelse(startsWith(x, "0"), NA,x )))
), but that's too much hard coding. I can not apply it to all columns with mutate(across(colnames(), ~ifelse(startsWith(., "0"), NA, .)))
as the numerical columns trigger an error.
Upvotes: 3
Views: 1967
Reputation: 1370
If you're able to do it for one column using mutate
, you should be able to do it for multiple columns using mutate_at()
or mutate_all()
, explained here: https://dplyr.tidyverse.org/reference/mutate_all.html
Without knowing what your data looks like, I think you'd want mutate_all()
to modify all columns which have data which matches your condition.
In this example using the iris
dataset, we replace all instances of 5
with the word five
:
iris %>%
tibble %>%
mutate_all(function(x) str_replace(x, '5', 'five'))
# A tibble: 150 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<chr> <chr> <chr> <chr> <chr>
1 five.1 3.five 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.five 0.2 setosa
5 five 3.6 1.4 0.2 setosa
6 five.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 five 3.4 1.five 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.five 0.1 setosa
Or like your condition, we can do this only when the string starts with 5
, using ^5
regex language (^
indicates the start of the string, and 5
means a 5
at the beginning of the string).
iris %>%
tibble %>%
mutate_all(function(x) str_replace(x, '^5', 'five'))
# A tibble: 150 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<chr> <chr> <chr> <chr> <chr>
1 five.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 five 3.6 1.4 0.2 setosa
6 five.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 five 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
Update To change the entire value, if it has a 5
at the start, you just need to change the str_replace
function to a function which can change the entire value. In this case, we use an ifelse
statement
iris %>%
tibble %>%
mutate_all(function(x) ifelse(str_detect(x, '^5'), 'had_five', x))
# A tibble: 150 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<chr> <dbl> <chr> <dbl> <int>
1 had_five 3.5 1.4 0.2 1
2 4.9 3 1.4 0.2 1
3 4.7 3.2 1.3 0.2 1
4 4.6 3.1 1.5 0.2 1
5 had_five 3.6 1.4 0.2 1
6 had_five 3.9 1.7 0.4 1
7 4.6 3.4 1.4 0.3 1
8 had_five 3.4 1.5 0.2 1
9 4.4 2.9 1.4 0.2 1
10 4.9 3.1 1.5 0.1 1
Another update From your comments, it sounds like you want to apply the function to only character columns. To do this, you can substitute mutate_all(your_fun)
for mutate_if(is.character, your_fun)
- as described in the help documentation at the start of this answer (the same info page describes mutate_all
, mutate_if
and mutate_at
).
Using your sample data as an example, we can set anything beginning with '0'
to NA. I am confused by your example though - do you want to look for '0'
or '0\n('
at the start of the string? Either way, this is how to do it:
# sample data
string <- c("asff", "1\n(", '0asfd', '0\n(asdf)')
num <- c(0,1,2,3)
df <- data.frame(string, num)
# for only a 0 at the start of the string
df %>%
mutate_if(is.character, function(x) ifelse(str_detect(x, '^0'), NA, x))
string num
1 asff 0
2 1\n( 1
3 <NA> 2
4 <NA> 3
# for '0\n(' at the start of the string
df %>%
mutate_if(is.character, function(x) ifelse(str_detect(x, '^0\\n\\('), NA, x))
string num
1 asff 0
2 1\n( 1
3 0asfd 2
4 <NA> 3
Upvotes: 4