Denis Efimov
Denis Efimov

Reputation: 115

How to transform long names into shorter (two-part) names

I have a character vector in which long names are used, which will consist of several words connected by delimiters in the form of a dot.

x <- c("Duschekia.fruticosa..Rupr...Pouzar",
       "Betula.nana.L.",
       "Salix.glauca.L.",
       "Salix.jenisseensis..F..Schmidt..Flod.",
       "Vaccinium.minus..Lodd...Worosch")

The length of the names is different. But only the first two words of the entire name are important.

My goal is to get names up to 7 symbols: 3 initial symbols from the first two words and a separator in the form of a "dot" between them.

Very close to my request are these examples, but I do not know how to apply these code variations to my case. R How to remove characters from long column names in a data frame and how to append names to " column names" of the output data frame in R?

What should I do to get exit names to look like this?

x <- c("Dus.fru",
       "Bet.nan",
       "Sal.gla",
       "Sal.jen",
       "Vac.min")

Any help would be appreciated.

Upvotes: 3

Views: 386

Answers (3)

Esben Eickhardt
Esben Eickhardt

Reputation: 3862

Here a less elegant solution than kath's, but a bit more easy to read, if you are not an expert in regex.

# Your data
x <- c("Duschekia.fruticosa..Rupr...Pouzar",
       "Betula.nana.L.",
       "Salix.glauca.L.",
       "Salix.jenisseensis..F..Schmidt..Flod.",
       "Vaccinium.minus..Lodd...Worosch")

# A function that takes three characters from first two words and merges them    
cleaner_fun <- function(ugly_string) {
  words <- strsplit(ugly_string, "\\.")[[1]]
  short_words <- substr(words, 1, 3)
  new_name <- paste(short_words[1:2], collapse = ".")
  return(new_name)
}

# Testing function
sapply(x, cleaner_fun)
[1]"Dus.fru" "Bet.nan" "Sal.gla" "Sal.jen" "Vac.min"

Upvotes: 1

zx8754
zx8754

Reputation: 56219

Split on dot, substring 3 characters, then paste back together:

sapply(strsplit(x, ".", fixed = TRUE), function(i){
  paste(substr(i[ 1 ], 1, 3), substr(i[ 2], 1, 3), sep = ".")
})
# [1] "Dus.fru" "Bet.nan" "Sal.gla" "Sal.jen" "Vac.min"

Upvotes: 3

kath
kath

Reputation: 7724

You can do the following:

gsub("(\\w{1,3})[^\\.]*\\.(\\w{1,3}).*", "\\1.\\2", x)
# [1] "Dus.fru" "Bet.nan" "Sal.gla" "Sal.jen" "Vac.min"

First we match up to 3 characters (\\w{1,3}), then ignore anything which is not a dot [^\\.]*, match a dot \\. and then again up to 3 characters (\\w{1,3}). Finally anything, that comes after that .*. We then only use the things in the brackets and separate them with a dot \\1.\\2.

Upvotes: 8

Related Questions