Reputation: 15
I am having a problem with the following script. When converting the min and max columns of the data.frame base to character, using dplyr, it "converts" back to character. Where the result that should be 582, ends up becoming 513.
base%>%
mutate(ocor=str_count(pass,letter))%>%
filter(ocor%>%between(min,max))%>%
count()
To correct the problem, I tried to convert the variables into the mechanics of dplyr. However, he seems to convert back.
base%>%
mutate(ocor=str_count(pass,letter))%>%
mutate(across(.cols = c('min', 'max'), .fns = ~ as.numeric(.)))%>%
filter(ocor%>%between(min,max))%>%
count()
class(base$max)
class(base$min)
n
1 513
> class(base$max)
[1] "character"
> class(base$min)
[1] "character"
Not using dplyr I got the correct result, an example:
a<-base%>%
mutate(ocor=str_count(pass,letter))%>%
select(ocor)
class(base$max)
class(base$min)
base$max<-as.integer(base$max)
base$min<-as.integer(base$min)
sum(a >= base$min & a <= base$max)
[1] 582
I can't understand what's going on. An example of the database for clarification:
head(base)
min max letter pass ocor
1 2 6 c fcpwjqhcgtffzlbj 2
2 6 9 x xxxtwlxxx 6
3 7 10 q nfbrgwqlvljgq 2
4 2 3 g gjggg 4
5 2 6 s sjsssss 6
6 4 13 b mdbctbzgcpdjbhsdctrd 3
The Original Base
without changes:
> head(base)
V1 V2 V3
1 2-6 c: fcpwjqhcgtffzlbj
2 6-9 x: xxxtwlxxx
3 5-6 w: wwwwlwwwh
4 7-10 q: nfbrgwqlvljgq
5 2-3 g: gjggg
6 9-11 q: qqqqqqnqgqq
The changes:
base<-read.table('base.txt')
library(tidyverse)
base<-base%>%
separate(V1,c('min','max'),'-')%>%
rename(letter=V2,pass=V3)%>%
mutate(letter = str_replace(letter,':',''))
Upvotes: 0
Views: 578
Reputation: 5776
That's because you are not altering base
.
%>%
does not assign the result to a variable. I.e.
base %>% mutate(foo=bar(x))
does not alter base
. It will just show the result on the console (and none if you are running the script or calling it from a function).
You might be confusing the pipe-operator with %<>% (found in the package magrittr) which uses the left-hand variable as input for the pipe, and overwrites the variable with the modified result.
Try
base <- base%>%
mutate(ocor=str_count(pass,letter))%>%
mutate(across(.cols = c('min', 'max'), .fns = ~ as.numeric(.)))%>%
filter(ocor%>%between(min,max))%>%
count()
Re. the issue with min
and max
being converted back to characters, I cannot reproduce.
Re. the issue with filtering not working as expected, it that between
doesn't seem to care for vectors for inputs left
and right
. A fairly new thing is the use of rowwise
:
Without rowwise
:
base%>%
mutate(ocor=str_count(pass,letter))%>%
mutate(across(.cols = c('min', 'max'), .fns = ~ as.numeric(.)))%>%
mutate(between(ocor, min,max))
min max letter pass ocor between(ocor, min, max)
1 2 6 c fcpwjqhcgtffzlbj 2 TRUE
2 6 9 x xxxtwlxxx 6 TRUE
3 7 10 q nfbrgwqlvljgq 2 TRUE
4 2 3 g gjggg 4 TRUE
5 2 6 s sjsssss 6 TRUE
6 4 13 b mdbctbzgcpdjbhsdctrd 3 TRUE
With rowwise
:
base%>%
mutate(ocor=str_count(pass,letter))%>%
mutate(across(.cols = c('min', 'max'), .fns = ~ as.numeric(.)))%>%
rowwise %>% mutate(between(ocor, min,max))
# A tibble: 6 x 6
# Rowwise:
min max letter pass ocor `between(ocor, min, max)`
<dbl> <dbl> <chr> <chr> <int> <lgl>
1 2 6 c fcpwjqhcgtffzlbj 2 TRUE
2 6 9 x xxxtwlxxx 6 TRUE
3 7 10 q nfbrgwqlvljgq 2 FALSE
4 2 3 g gjggg 4 FALSE
5 2 6 s sjsssss 6 TRUE
6 4 13 b mdbctbzgcpdjbhsdctrd 3 FALSE
Upvotes: 1