Reputation: 54251
I have a column of data in a R data frame that has values such as:
Blue-#105
Green-#8845
Yellow-#5454
Blue-#999
I want to remove the last number part (starting at -#) so that Blue-#999
and Blue-#105
are consider the same thing when plotting. How could I accomplish this?
Upvotes: 2
Views: 452
Reputation: 368201
Use regular expressions:
> DF <- data.frame(col=c("Blue-#105", "Green-#8845", "Blue-#999"))
> DF
col
1 Blue-#105
2 Green-#8845
3 Blue-#999
> DF$col <- gsub("-\\#.*", "", DF$col)
> DF
col
1 Blue
2 Green
3 Blue
>
Here we say that all strings starting with -#
(where the comment char #
needs to be escaped) and followed by whatever --- which is .*
in regular expression lingo: any char (the dot) repeated as many times as it fits (the star) --- will get replaced by the empty string, or in other words, removed.
Upvotes: 7
Reputation: 49640
Use the sub or gsub function. For your example you could do something like:
newcolors <- sub("^([^-]*)-.*$", "\\1", oldcolors )
This assumes that the colors are in a vector 'oldcolors' and puts the results into newcolors. The pattern starts at the beginning of the string (^) then matches 0 or more characters that are not dashes ([^-]), the parens around that says to save what is matched. Then it matches a dash followed by further characters (.) until the end of the string ($), the matched portion (the entire string) is then replaced by whatever was matched within the parens (the color).
Upvotes: 3