Yuki Weber
Yuki Weber

Reputation: 21

Remove part of character string in data frame

I'm struggling to delete the strings "D_1__" , "D_2__" , "D_3__" , etc. in a data.frame while keeping the succeeding text, i.e. input: "D_1__succeeding text" output: "succeeding text".

I tried

df <- gsub("D_.__", "", df)

but nothing was changed

Further the columns of the imported data.frame are factors with more than one level. May this be causing the problem and how can I convert the df?

Thanks a lot for you help!

Upvotes: 1

Views: 9352

Answers (3)

Yuki Weber
Yuki Weber

Reputation: 21

Thanks for your suggestions. Finally, I managed to convert my df to a character matrix by:

df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)

Then I applied:

gsub('D_.__', '', df$V1)

for each column separately. For just 15 columns this was quite feasable :-)

Upvotes: 0

Brian Stamper
Brian Stamper

Reputation: 2263

You are facing two issues: gsub is designed to work with character vectors, not entire data frames, and you are also dealing with factors instead of a character vector.

I'm not sure how you are importing your data, but you probably have the option to use stringsAsFactors = FALSE to prevent the strings being read as factors in the first place, e.g. for CSV data:

df <- read.csv('mydata.csv', stringsAsFactors = FALSE)

Alternatively, you could convert your factors to strings:

df$myvar <- as.character(df$myvar)

Once you have a character vector, you can use gsub pretty much like you had it, just specify the variable:

df$myvar <- gsub('D_.__', '', df$myvar)

Finally, if you did want to leave your variable as a factor, you could rename the levels instead:

levels(df$myvar) <- gsub('D_.__', '', levels(df$myvar))

Upvotes: 2

Nate
Nate

Reputation: 10671

strings <- c("D_1__text1" , "D_2__text2" , "D_3__text3")
new_strings <- gsub("D_\\d__", "", strings)

> new_strings
[1] "text1" "text2" "text3"

If it is a problem with your specific data add a dput(your_df). I think your problem was how you are trying to store your result. Something like df$colnew <- gsub(..., df$colold) should work.

Upvotes: 0

Related Questions