Reputation: 438
My dataset looks like this below
Id Col1
--------------------
133 Mary 7E
281 Feliz 2D
437 Albert 4C
What I am trying to do is to take the 1st two characters from the 1st word in Col1 and all the whole second word and then merge them.
My final expected dataset should look like this below
Id Col1
--------------------
133 MA7E
281 FE2D
437 AL4C
Any suggestions on how to accomplish this is much appreciated.
Upvotes: 1
Views: 876
Reputation: 4970
For this solution use substr
to take the first 2 elements from each string, and the last 2. For selecting the last 2 we need nchar
, as part of sapply
. paste0
together. Also using toupper
to have capital letters.
l2 <- sapply(df$Col1, function(x) nchar(x))
paste0(toupper(substr(df$Col1,1,2)), substr(df$Col1, l2-1, l2))
[1] "MA7E" "FE2D" "AL4C"
Upvotes: 0
Reputation: 711
rather than one row solution this is easy to interpret and modify
xx_df <- data.frame(id = c(133,281,437),
Col1 = c("Mary 7E", "Feliz 2D", "Albert 4C"))
xx_df %>%
mutate(xpart1 = stri_split_fixed(Col1, " ", simplify = T)[,1]) %>%
mutate(xpart2 = stri_split_fixed(Col1, " ", simplify = T)[,2]) %>%
mutate(Col1_new = paste0(substr(xpart1,1,2), substr(xpart2, 1, 2))) %>%
select(id, Col1 = Col1_new) %>%
mutate(Col1 = toupper(Col1))
result is
id Col1
1 133 MA7E
2 281 FE2D
3 437 AL4C
Upvotes: 1
Reputation: 7724
You can do
my_data$Col1 <- sub("(\\w{2})(\\w* )(\\b\\w+\\b)", "\\1\\3", my_data$Col1)
my_data$Col1 <- toupper(my_data$Col1)
my_data
# Id Col1
# 1 133 MA7E
# 2 281 FE2D
# 3 437 AL4C
The brackets show the single groups that are matched and only the first and the third are retained. \\w
matches letters and numbers and \\b
matches the boundary of words.
Upvotes: 2
Reputation: 12155
We can also do this in paste0
together the output of substr
and str_split
within a dplyr
pipe chain:
df <- data.frame(id = c(133,281,437),
Col1 = c("Mary 7E", "Feliz 2D", "Albert 4C"))
library(stringr)
df %>%
mutate(Col1 = toupper(paste0(substr(Col1, 1, 2),
stringr::str_split(Col1, ' ')[[1]][-1])))
Upvotes: 2
Reputation: 520928
Here is another variation using sub
. We can use lookarounds in Perl mode to selectively remove everything except for the first two, and last two, characters. Then, make a call to toupper()
to capitalize all letters.
df$Col1 <- toupper(sub("(?<=^..).*(?=..$)", "", df$Col1), perl=TRUE)
[1] "MA7E" "FE2D" "AL4C"
Upvotes: 1
Reputation: 70623
You can do this in several steps. First split by space, subset first two letters of the name and capitalize them. Paste that together with the second part. Result is in column final
. You could take all these intermediate steps or chain commands into less statements, whatever floats your boat.
xy <- data.frame(id = c(133, 281, 437),
name = c("Mary 7E", "Feliz 2D", "Albert 4C"),
stringsAsFactors = FALSE)
xy$first <- sapply(strsplit(xy$name, " "), "[", 1)
xy$second <- sapply(strsplit(xy$name, " "), "[", 2)
xy$first_upper <- toupper(substr(x = xy$first, start = 1, stop = 2))
xy$final <- paste(xy$first_upper, xy$second, sep = "")
xy
id name first second first_upper final
1 133 Mary 7E Mary 7E MA MA7E
2 281 Feliz 2D Feliz 2D FE FE2D
3 437 Albert 4C Albert 4C AL AL4C
Upvotes: 1