Magliari
Magliari

Reputation: 15

R dplyr: Problem converting a column from character to integer using dplyr

I am having a problem with the following script. When converting the min and max columns of the data.frame base to character, using dplyr, it "converts" back to character. Where the result that should be 582, ends up becoming 513.

base%>%
  mutate(ocor=str_count(pass,letter))%>%
  filter(ocor%>%between(min,max))%>%
  count()

To correct the problem, I tried to convert the variables into the mechanics of dplyr. However, he seems to convert back.

base%>%   
  mutate(ocor=str_count(pass,letter))%>%   
  mutate(across(.cols = c('min', 'max'), .fns = ~ as.numeric(.)))%>%   
  filter(ocor%>%between(min,max))%>%   
  count() 
class(base$max) 
class(base$min)

    n
1 513
> class(base$max) 
[1] "character"
> class(base$min)
[1] "character"

Not using dplyr I got the correct result, an example:

a<-base%>%
  mutate(ocor=str_count(pass,letter))%>%
  select(ocor)
class(base$max)
class(base$min)
base$max<-as.integer(base$max)
base$min<-as.integer(base$min)
sum(a >= base$min & a <= base$max)

[1] 582

I can't understand what's going on. An example of the database for clarification:

head(base)

  min max letter                 pass ocor
1   2   6      c     fcpwjqhcgtffzlbj    2
2   6   9      x            xxxtwlxxx    6
3   7  10      q        nfbrgwqlvljgq    2
4   2   3      g                gjggg    4
5   2   6      s              sjsssss    6
6   4  13      b mdbctbzgcpdjbhsdctrd    3

The Original Basewithout changes:

 > head(base)
    V1 V2               V3
1  2-6 c: fcpwjqhcgtffzlbj
2  6-9 x:        xxxtwlxxx
3  5-6 w:        wwwwlwwwh
4 7-10 q:    nfbrgwqlvljgq
5  2-3 g:            gjggg
6 9-11 q:      qqqqqqnqgqq

The changes:

base<-read.table('base.txt')
library(tidyverse)
base<-base%>%
  separate(V1,c('min','max'),'-')%>%
  rename(letter=V2,pass=V3)%>%
  mutate(letter = str_replace(letter,':',''))

Upvotes: 0

Views: 578

Answers (1)

MrGumble
MrGumble

Reputation: 5776

That's because you are not altering base.

%>% does not assign the result to a variable. I.e.

base %>% mutate(foo=bar(x))

does not alter base. It will just show the result on the console (and none if you are running the script or calling it from a function).

You might be confusing the pipe-operator with %<>% (found in the package magrittr) which uses the left-hand variable as input for the pipe, and overwrites the variable with the modified result.

Try

base <- base%>%   
  mutate(ocor=str_count(pass,letter))%>%   
  mutate(across(.cols = c('min', 'max'), .fns = ~ as.numeric(.)))%>%   
  filter(ocor%>%between(min,max))%>%   
  count() 

Re. the issue with min and max being converted back to characters, I cannot reproduce.

Re. the issue with filtering not working as expected, it that between doesn't seem to care for vectors for inputs left and right. A fairly new thing is the use of rowwise:

Without rowwise:

base%>%   
    mutate(ocor=str_count(pass,letter))%>%   
    mutate(across(.cols = c('min', 'max'), .fns = ~ as.numeric(.)))%>%   
    mutate(between(ocor, min,max))
  min max letter                 pass ocor between(ocor, min, max)
1   2   6      c     fcpwjqhcgtffzlbj    2                    TRUE
2   6   9      x            xxxtwlxxx    6                    TRUE
3   7  10      q        nfbrgwqlvljgq    2                    TRUE
4   2   3      g                gjggg    4                    TRUE
5   2   6      s              sjsssss    6                    TRUE
6   4  13      b mdbctbzgcpdjbhsdctrd    3                    TRUE

With rowwise:

base%>%   
    mutate(ocor=str_count(pass,letter))%>%   
    mutate(across(.cols = c('min', 'max'), .fns = ~ as.numeric(.)))%>%   
    rowwise %>% mutate(between(ocor, min,max))
# A tibble: 6 x 6
# Rowwise: 
    min   max letter pass                  ocor `between(ocor, min, max)`
  <dbl> <dbl> <chr>  <chr>                <int> <lgl>                    
1     2     6 c      fcpwjqhcgtffzlbj         2 TRUE                     
2     6     9 x      xxxtwlxxx                6 TRUE                     
3     7    10 q      nfbrgwqlvljgq            2 FALSE                    
4     2     3 g      gjggg                    4 FALSE                    
5     2     6 s      sjsssss                  6 TRUE                     
6     4    13 b      mdbctbzgcpdjbhsdctrd     3 FALSE     

           

Upvotes: 1

Related Questions