Reputation: 112
I have a dataframe (d3) which has some column names with "Date_Month.Year", I want to replace those column names with just "Month.Year" so if there are multiple columns with the same "Month.Year" they will just be a summed column.
Below is the code I tried and the output
library(stringr)
print(colnames(d3))
#below is output of the print statement
#[1] "ProductCategoryDesc" "RegionDesc" "SourceDesc" "variable"
#[5] "2019-02-28_Feb.2019" "2019-03-01_Mar.2019" "2019-03-04_Mar.2019" "2019-03-05_Mar.2019"
#[9] "2019-03-06_Mar.2019" "2019-03-07_Mar.2019" "2019-03-08_Mar.2019"
d3 <- d3 %>% mutate(col = str_remove(col, '*._'))
Here is the error I get:
Evaluation error: argument str
should be a character vector (or an object coercible to).
So I got the first part of my problem answered I used to get all column names in Month.Year format but now I am having issues with summing the columns that have the same name, for that I looked at Sum and replace columns with same name R for a data frame containing different classes
colnames(d3) <- gsub('.*_', '', colnames(d3))
Below is the code I used to get the columns summed that have a duplicate name, however with this code it is not necessarily putting the summed values in the correct columns.
indx <- sapply(d3, is.numeric)#check which columns are numeric
nm1 <- which(indx)#get the numeric index of the column
indx2 <- duplicated(names(nm1))|duplicated(names(nm1),fromLast=TRUE)
nm2 <- nm1[indx2]
indx3 <- duplicated(names(nm2))
d3[nm2[!indx3]] <- Map(function(x,y) rowSums(x[y],na.rm = FALSE),
list(d3),split(nm2, names(nm2)))
d3 <- d3[ -nm2[indx3]]
Upvotes: 0
Views: 1479
Reputation: 100
colnames(d3) <- sapply(colnames(d3), function(colname){
return( str_remove(colname, '.*_') )
})
The regex should be ".*_" to match the case you need
Upvotes: 0
Reputation: 12165
If you want to change the column names, you should be changing colnames
:
colnames(d3) <- gsub('.*_', '', colnames(d3))
Note, in your regex, quantifiers (ie *
) go after the thing they quantify. So it should be .*_
rather than *._
An example where we remove text before a .
in iris
:
colnames(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
# In regex, . means any character, so to match an actual '.',
# we need to 'escape' it with \\.
colnames(iris) <- gsub('.*\\.', '', colnames(iris))
colnames(iris)
[1] "Length" "Width" "Length" "Width" "Species"
Upvotes: 1