Kim Jenkins
Kim Jenkins

Reputation: 438

r String split and merge

My dataset looks like this below

 Id       Col1
 --------------------
 133      Mary 7E
 281      Feliz 2D
 437      Albert 4C

What I am trying to do is to take the 1st two characters from the 1st word in Col1 and all the whole second word and then merge them.

My final expected dataset should look like this below

 Id       Col1
 --------------------
 133      MA7E
 281      FE2D
 437      AL4C

Any suggestions on how to accomplish this is much appreciated.

Upvotes: 1

Views: 876

Answers (6)

milan
milan

Reputation: 4970

For this solution use substr to take the first 2 elements from each string, and the last 2. For selecting the last 2 we need nchar, as part of sapply. paste0 together. Also using toupper to have capital letters.

l2 <- sapply(df$Col1, function(x) nchar(x))
paste0(toupper(substr(df$Col1,1,2)), substr(df$Col1, l2-1, l2))

[1] "MA7E" "FE2D" "AL4C"

Upvotes: 0

Selcuk Akbas
Selcuk Akbas

Reputation: 711

rather than one row solution this is easy to interpret and modify

xx_df <- data.frame(id = c(133,281,437),
                 Col1 = c("Mary 7E", "Feliz 2D", "Albert 4C"))


xx_df %>% 
  mutate(xpart1 = stri_split_fixed(Col1, " ", simplify = T)[,1]) %>% 
  mutate(xpart2 = stri_split_fixed(Col1, " ", simplify = T)[,2])  %>% 
  mutate(Col1_new = paste0(substr(xpart1,1,2), substr(xpart2, 1, 2))) %>% 
  select(id, Col1 = Col1_new) %>% 
  mutate(Col1 = toupper(Col1))

result is

   id Col1
1 133 MA7E
2 281 FE2D
3 437 AL4C

Upvotes: 1

kath
kath

Reputation: 7724

You can do

my_data$Col1 <- sub("(\\w{2})(\\w* )(\\b\\w+\\b)", "\\1\\3", my_data$Col1)
my_data$Col1 <- toupper(my_data$Col1)

my_data
#    Id Col1
# 1 133 MA7E
# 2 281 FE2D
# 3 437 AL4C

The brackets show the single groups that are matched and only the first and the third are retained. \\w matches letters and numbers and \\b matches the boundary of words.

Upvotes: 2

divibisan
divibisan

Reputation: 12155

We can also do this in paste0 together the output of substr and str_split within a dplyr pipe chain:

df <- data.frame(id = c(133,281,437),
                 Col1 = c("Mary 7E", "Feliz 2D", "Albert 4C"))

library(stringr)
df %>%
    mutate(Col1 = toupper(paste0(substr(Col1, 1, 2),
                                 stringr::str_split(Col1, ' ')[[1]][-1])))

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520928

Here is another variation using sub. We can use lookarounds in Perl mode to selectively remove everything except for the first two, and last two, characters. Then, make a call to toupper() to capitalize all letters.

df$Col1 <- toupper(sub("(?<=^..).*(?=..$)", "", df$Col1), perl=TRUE)

[1] "MA7E" "FE2D" "AL4C"

Demo

Upvotes: 1

Roman Luštrik
Roman Luštrik

Reputation: 70623

You can do this in several steps. First split by space, subset first two letters of the name and capitalize them. Paste that together with the second part. Result is in column final. You could take all these intermediate steps or chain commands into less statements, whatever floats your boat.

xy <- data.frame(id = c(133, 281, 437),
                 name = c("Mary 7E", "Feliz 2D", "Albert 4C"),
                 stringsAsFactors = FALSE)

xy$first <- sapply(strsplit(xy$name, " "), "[", 1)
xy$second <- sapply(strsplit(xy$name, " "), "[", 2)
xy$first_upper <- toupper(substr(x = xy$first, start = 1, stop = 2))
xy$final <- paste(xy$first_upper, xy$second, sep = "")
xy

   id      name  first second first_upper final
1 133   Mary 7E   Mary     7E          MA  MA7E
2 281  Feliz 2D  Feliz     2D          FE  FE2D
3 437 Albert 4C Albert     4C          AL  AL4C

Upvotes: 1

Related Questions