Reputation: 21
I'm struggling to delete the strings "D_1__"
, "D_2__"
, "D_3__"
, etc. in a data.frame
while keeping the succeeding text, i.e. input: "D_1__succeeding text"
output: "succeeding text"
.
I tried
df <- gsub("D_.__", "", df)
but nothing was changed
Further the columns of the imported data.frame
are factors
with more than one level. May this be causing the problem and how can I convert the df
?
Thanks a lot for you help!
Upvotes: 1
Views: 9352
Reputation: 21
Thanks for your suggestions. Finally, I managed to convert my df to a character matrix by:
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
Then I applied:
gsub('D_.__', '', df$V1)
for each column separately. For just 15 columns this was quite feasable :-)
Upvotes: 0
Reputation: 2263
You are facing two issues: gsub
is designed to work with character vectors, not entire data frames, and you are also dealing with factors instead of a character vector.
I'm not sure how you are importing your data, but you probably have the option to use stringsAsFactors = FALSE
to prevent the strings being read as factors in the first place, e.g. for CSV data:
df <- read.csv('mydata.csv', stringsAsFactors = FALSE)
Alternatively, you could convert your factors to strings:
df$myvar <- as.character(df$myvar)
Once you have a character vector, you can use gsub pretty much like you had it, just specify the variable:
df$myvar <- gsub('D_.__', '', df$myvar)
Finally, if you did want to leave your variable as a factor, you could rename the levels instead:
levels(df$myvar) <- gsub('D_.__', '', levels(df$myvar))
Upvotes: 2
Reputation: 10671
strings <- c("D_1__text1" , "D_2__text2" , "D_3__text3")
new_strings <- gsub("D_\\d__", "", strings)
> new_strings
[1] "text1" "text2" "text3"
If it is a problem with your specific data add a dput(your_df)
. I think your problem was how you are trying to store your result. Something like df$colnew <- gsub(..., df$colold)
should work.
Upvotes: 0