nohomejerome
nohomejerome

Reputation: 141

R:Change Column Names based on pattern

I have a quite basic R question.

The column names of my data frame all have an unique pattern:

colnames <- c("MSCI 'COUNTRY NAME 1' - PRICE INDEX","MSCI 'COUNTRY NAME 2' - PRICE INDEX",
"MSCI 'COUNTRY NAME 3' - PRICE INDEX","MSCI 'COUNTRY NAME 4' - PRICE INDEX")

Example for one Country: MSCI CANADA - PRICE INDEX.

I want to change all column names to just the country name (in this case 'Canada'). Is there a quick way to remove the 'MSCI', the 'Price Index' and the capital letters?

Thanks!

Upvotes: 0

Views: 100

Answers (2)

akrun
akrun

Reputation: 887148

An option with str_extract

library(stringr)
str_extract(v1, "(?<=')[^']+")
#[1] "COUNTRY NAME 1" "COUNTRY NAME 2" "COUNTRY NAME 3" "COUNTRY NAME 4"

data

v1 <- c("MSCI 'COUNTRY NAME 1' - PRICE INDEX", "MSCI 'COUNTRY NAME 2' - PRICE INDEX", 
"MSCI 'COUNTRY NAME 3' - PRICE INDEX", "MSCI 'COUNTRY NAME 4' - PRICE INDEX"
)

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521289

Use sub for a base R option:

colnames <- sub("^MSCI '(.*?)'.*$", "\\1", colnames)
colnames

[1] "COUNTRY NAME 1" "COUNTRY NAME 2" "COUNTRY NAME 3" "COUNTRY NAME 4"

Data:

colnames <- c("MSCI 'COUNTRY NAME 1' - PRICE INDEX",
              "MSCI 'COUNTRY NAME 2' - PRICE INDEX",
              "MSCI 'COUNTRY NAME 3' - PRICE INDEX",
              "MSCI 'COUNTRY NAME 4' - PRICE INDEX")

If the country names really don't have single quotes around them, then use this version:

name <- "MSCI CANADA - PRICE INDEX"
country <- sub("^MSCI (.*?) - PRICE INDEX$", "\\1", name)

Upvotes: 1

Related Questions