Reputation: 59
I am cleaning data in R and some of the CSVs have an unfortunate error. Occasionally, the first letter of the school
character variable has instead been added to the end of the gender
variable (which is normally a single character, m
or f
). Examples:
mydata <- data.frame(
gender = c('m', 'm H', 'f', 'f C'),
school = c('Hills College', 'ills College', 'Christian College', 'hristian College')
)
How can I identify these mistakes and move the trailing letter in gender
to its rightful place at the start of school
?
Upvotes: 0
Views: 601
Reputation: 79228
You could do:
transform(mydata,
gender = sub("\\s+\\w+\\s*", "", gender),
school = paste0(sub("\\w\\s*","", gender), school))
gender school
1 m Hills College
2 m Hills College
3 f Christian College
4 f Christian College
In tidyverse, you could do:
library(tidyverse)
mydata %>%
separate(gender, c("gender","first_char"), fill = "right") %>%
replace_na(list(first_char = ""))%>%
unite(school,first_char, school,sep = "")
gender school
1 m Hills College
2 m Hills College
3 f Christian College
4 f Christian College
Upvotes: 1
Reputation: 112
It's may be a solution:
library(tidyverse)
mydata %>%
mutate(school = if_else(str_count(gender) == 1,
school,
str_c(str_sub(gender, start = -1),
school)))
Upvotes: 2
Reputation: 521289
We can try using sub
for a base R option:
# concatenate last letter of gender to front of school, if gender has dangling letter
mydata$school <- ifelse(grepl(" \\w$", mydata$gender),
paste0(sub("^.*(\\w)$", "\\1", mydata$gender), mydata$school),
mydata$school)
# remove dangling letter from gender, if present
mydata$gender <- sub("\\s+\\w$", "", mydata$gender)
mydata
gender school
1 m 2
2 m Hills College
3 f 1
4 f Christian College
Upvotes: 1