sbliss
sbliss

Reputation: 87

R split a character string into multiple columns when have different string lengths, dplyr

I have animal tracking data where each animal was encountered over time and the sex was recorded at each encounter. There are three types of encounters (type1, type2, and type3). Each row represents an animal and each encounter is classified as M (male) or F (female). Each character in the type represents an encounter (eg. MMMM is an animal seen four times and recorded as male each time).

Sample data:

animal.ID    type1         type2       type3
1            MMMMMMM       M           M
2            MFMM          M           M
3            FFM           F           F
4            FFFFFFFFF     F           F  
5            MM            M           M

I want to know if the sex (male or female) was recorded consistently for each animal.

I want to produce something like this, where a column indicates if sex was consistently recorded consistently (1) or not (0).

animal.ID    type1         type2       type3    consistent
1            MMMMMMM       M           M         1
2            MFMM          M           M         0
3            FFM           F           F         0
4            FFFFFFFFF     F           F         1
5            MM            M           M         1

I can use if_else to get the 'consistent' column for the type2 and type3 data:

df %>%
   mutate(consistent = if_else(type2 == type3), 1, 0))

But, I can't include the type1 data since it has multiple characters in each string, and, different numbers of character in each string.

One approach could be to use str_split to split type1 into multiple columns, but, I don't know how to do that given the different number of characters in each string.

Upvotes: 1

Views: 110

Answers (3)

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

Another solution using logic @Ronak Shah

library(tidyverse)
df %>% 
      unite("all_type", starts_with("type"), sep = "", remove = F) %>% 
      mutate(consistent = map(strsplit(all_type, ""), ~ +(n_distinct(.x) == 1)))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388982

We can use charToRaw to get the "raw" representation of type1 and assign 1 if they all are the same.

df$consistent <- +(sapply(df$type1, function(x) length(unique(charToRaw(x)))) ==1)

Using dplyr, we can use the same logic as :

library(dplyr)

df %>%
  rowwise() %>%
  mutate(consistent = +(n_distinct(charToRaw(type1)) == 1))


#  animal.ID type1     type2 type3 consistent
#      <int> <chr>     <chr> <chr>      <int>
#1         1 MMMMMMM   M     M              1
#2         2 MFMM      M     M              0
#3         3 FFM       F     F              0
#4         4 FFFFFFFFF F     F              1
#5         5 MM        M     M              1

data

df <- structure(list(animal.ID = 1:5, type1 = c("MMMMMMM", "MFMM", 
"FFM", "FFFFFFFFF", "MM"), type2 = c("M", "M", "F", "F", "M"), 
type3 = c("M", "M", "F", "F", "M")), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 1

Ben
Ben

Reputation: 30474

One approach may be to use strsplit and unlist, checking that all characters are equal to type2 (in addition to checking that type2 equals type3).

df %>%
  rowwise() %>%
  mutate(consistent = ifelse(type2 == type3 & all(unlist(strsplit(type1, "")) == type2), 1, 0))

Output

# A tibble: 5 x 5
  animal.ID type1     type2 type3 consistent
      <int> <chr>     <chr> <chr>      <dbl>
1         1 MMMMMMM   M     M              1
2         2 MFMM      M     M              0
3         3 FFM       F     F              0
4         4 FFFFFFFFF F     F              1
5         5 MM        M     M              1

Upvotes: 3

Related Questions