Reputation: 454
I am trying to rename taxa annotations in an abundance matrix for bubble plot creation (Original data 16S MiSeq). My data frame "data_melt" is shown below:
And I am looking to rename the taxa IDs in the "variable" column to simply the last name (class level). For example: "D_0__Archaea.D_1__Altiarchaeota.D_2__Altiarchaeia" to "Altiarchaeia".
I have attempted
data_melt$variable <- gsub("D_0__[A-z].D_1__[A-z].D_2__", "", data_melt$variable)
with no avail. I have used this line of code on other datasets successfully, but there is no change to "data_melt". There aren't even any warning/error messages. Any ideas?
Thank you in advance,
J
Upvotes: 1
Views: 43
Reputation: 626870
You might fix your approach by replacing [A-z].
with [A-Za-z]+\\.
:
data_melt$variable <- sub("D_0__[A-Za-z]+\\.D_1__[A-Za-z]+\\D_2__", "", data_melt$variable)
The [A-z]
matches more than just letters and .
matches any char, while you wanted to match a literal dot. When the dot is escaped, it only matches a literal dot.
However, you may solve the problem by removing all up to and including the last underscore:
sub(".*_", "", data_melt$variable)
Note you may use sub
as you expect one replacement to be made.
Upvotes: 1