MLS
MLS

Reputation: 112

Replace Column Names With String Right of "_"

I have a dataframe (d3) which has some column names with "Date_Month.Year", I want to replace those column names with just "Month.Year" so if there are multiple columns with the same "Month.Year" they will just be a summed column.

Below is the code I tried and the output

library(stringr)

print(colnames(d3))
 #below is output of the print statement
 #[1] "ProductCategoryDesc" "RegionDesc"          "SourceDesc"          "variable"           
 #[5] "2019-02-28_Feb.2019" "2019-03-01_Mar.2019" "2019-03-04_Mar.2019" "2019-03-05_Mar.2019"
 #[9] "2019-03-06_Mar.2019" "2019-03-07_Mar.2019" "2019-03-08_Mar.2019" 

d3 <- d3 %>% mutate(col = str_remove(col, '*._'))

Here is the error I get: Evaluation error: argument str should be a character vector (or an object coercible to).

So I got the first part of my problem answered I used to get all column names in Month.Year format but now I am having issues with summing the columns that have the same name, for that I looked at Sum and replace columns with same name R for a data frame containing different classes

colnames(d3) <- gsub('.*_', '', colnames(d3))

Below is the code I used to get the columns summed that have a duplicate name, however with this code it is not necessarily putting the summed values in the correct columns.

indx <- sapply(d3, is.numeric)#check which columns are numeric
nm1 <- which(indx)#get the numeric index of the column
indx2 <- duplicated(names(nm1))|duplicated(names(nm1),fromLast=TRUE)
nm2 <- nm1[indx2]
indx3 <- duplicated(names(nm2))
d3[nm2[!indx3]] <- Map(function(x,y) rowSums(x[y],na.rm = FALSE), 
                        list(d3),split(nm2, names(nm2)))
d3 <- d3[ -nm2[indx3]]

Upvotes: 0

Views: 1479

Answers (2)

Krzysztof Nowicki
Krzysztof Nowicki

Reputation: 100

colnames(d3) <- sapply(colnames(d3), function(colname){
    return( str_remove(colname, '.*_') )
})

The regex should be ".*_" to match the case you need

Upvotes: 0

divibisan
divibisan

Reputation: 12165

If you want to change the column names, you should be changing colnames:

colnames(d3) <- gsub('.*_', '', colnames(d3))

Note, in your regex, quantifiers (ie *) go after the thing they quantify. So it should be .*_ rather than *._

An example where we remove text before a . in iris:

colnames(iris)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     

# In regex, . means any character, so to match an actual '.',
#   we need to 'escape' it with \\.
colnames(iris) <- gsub('.*\\.', '', colnames(iris))

colnames(iris)
[1] "Length"  "Width"   "Length"  "Width"   "Species"

Upvotes: 1

Related Questions