David Smerdon
David Smerdon

Reputation: 59

Using R to move the end of a string to the start of another variable's string for some rows

I am cleaning data in R and some of the CSVs have an unfortunate error. Occasionally, the first letter of the school character variable has instead been added to the end of the gender variable (which is normally a single character, m or f). Examples:

mydata <- data.frame(
  gender = c('m', 'm  H', 'f', 'f  C'),
  school = c('Hills College', 'ills College', 'Christian College', 'hristian College')
)

How can I identify these mistakes and move the trailing letter in gender to its rightful place at the start of school?

Upvotes: 0

Views: 601

Answers (3)

Onyambu
Onyambu

Reputation: 79228

You could do:

transform(mydata, 
         gender = sub("\\s+\\w+\\s*", "", gender),
         school = paste0(sub("\\w\\s*","", gender), school))
  gender            school
1      m     Hills College
2      m     Hills College
3      f Christian College
4      f Christian College

In tidyverse, you could do:

library(tidyverse)
mydata %>%
  separate(gender, c("gender","first_char"), fill = "right") %>%
  replace_na(list(first_char = ""))%>%
  unite(school,first_char, school,sep = "")

  gender            school
1      m     Hills College
2      m     Hills College
3      f Christian College
4      f Christian College

Upvotes: 1

Kevin_Nguyen
Kevin_Nguyen

Reputation: 112

It's may be a solution:

library(tidyverse)
mydata %>% 
  mutate(school = if_else(str_count(gender) == 1,
                          school,
                          str_c(str_sub(gender, start = -1),
                                school)))

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521289

We can try using sub for a base R option:

# concatenate last letter of gender to front of school, if gender has dangling letter
mydata$school <- ifelse(grepl(" \\w$", mydata$gender),
                        paste0(sub("^.*(\\w)$", "\\1", mydata$gender), mydata$school),
                        mydata$school)

# remove dangling letter from gender, if present
mydata$gender <- sub("\\s+\\w$", "", mydata$gender)
mydata

  gender            school
1      m                 2
2      m     Hills College
3      f                 1
4      f Christian College

Upvotes: 1

Related Questions