Geomicro
Geomicro

Reputation: 454

How may I rename variables in a df using gsub()?

I am trying to rename taxa annotations in an abundance matrix for bubble plot creation (Original data 16S MiSeq). My data frame "data_melt" is shown below:

enter image description here

And I am looking to rename the taxa IDs in the "variable" column to simply the last name (class level). For example: "D_0__Archaea.D_1__Altiarchaeota.D_2__Altiarchaeia" to "Altiarchaeia".

I have attempted

data_melt$variable <- gsub("D_0__[A-z].D_1__[A-z].D_2__", "", data_melt$variable)

with no avail. I have used this line of code on other datasets successfully, but there is no change to "data_melt". There aren't even any warning/error messages. Any ideas?

Thank you in advance,

J

Upvotes: 1

Views: 43

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626870

You might fix your approach by replacing [A-z]. with [A-Za-z]+\\.:

data_melt$variable <- sub("D_0__[A-Za-z]+\\.D_1__[A-Za-z]+\\D_2__", "", data_melt$variable)

The [A-z] matches more than just letters and . matches any char, while you wanted to match a literal dot. When the dot is escaped, it only matches a literal dot.

However, you may solve the problem by removing all up to and including the last underscore:

sub(".*_", "", data_melt$variable)

Note you may use sub as you expect one replacement to be made.

Upvotes: 1

Related Questions