Reputation: 113
I have a dataframe with two columns and a few hundred rows, let's call it df
which looks like this -
Name Chemical_Formula
PALMITYL-COA C37H62N7O17P3S1
CPD0-888 C34H52N7O24P2
3-OXOPALMITOYL-COA C37H60N7O18P3S1
OH-MYRISTOYL C43H75N3O20P2
CPD-19171 C39H64N7O18P3S1
CPD-15253 C52H99N3O13P2
CPD-12122 C75H112O2
CPD0-937 C149H260N2O78P4
.... .....
.... .....
Now if the Chemical_Formula
for some of the compounds ends in 1
I want to remove that 1
from the chemical formula. For example for the first compound PALMITYL-COA
the chemical formula is C37H62N7O17P3S1
which ends in 1
. So in my new dataframe I want the chemical formula for this first compound to be C37H62N7O17P3S
.
So, my new dataframe should look like this -
Name Chemical_Formula
PALMITYL-COA C37H62N7O17P3S
CPD0-888 C34H52N7O24P2
3-OXOPALMITOYL-COA C37H60N7O18P3S
OH-MYRISTOYL C43H75N3O20P2
CPD-19171 C39H64N7O18P3S
CPD-15253 C52H99N3O13P2
CPD-12122 C75H112O2
CPD0-937 C149H260N2O78P4
.... .....
.... .....
I want to keep all the Chemical Formulas as it is if they don't end in the number 1
. The ones which end in 1
I just want to remove that 1, keeping the rest of the formula as it is.
I was looking for ways to do this using gsub sub grepl
orsubset
functions but not quite sure what pattern to give using the regular expression rules. Please help!
Upvotes: 1
Views: 73
Reputation: 133770
Following may help you here. Where I am using sub
substitute function of base R
to remove 1
if it id at end of the element with NULL.
sub("1$","",df$Chemical_Formula)
To save this output into same column use df$Chemical_Formula <-
in above code too.
Explanation of code:
sub
: sub
is base R
's function which works on method of sub(regex_needs_to_be_used_to_replace_present_content,"with_new_content",variable)
"1$"
: Means telling sub
to act upon only those line which are ending with 1
for df's column named Chemical_Formula
(which I am explaining further this post)
""
: If above match found in any value then replace line's ending 1
with NULL here as per OP's request.
df$Chemical_Formula
: data frame named df's column named Chemical_Formula
Upvotes: 2
Reputation: 52977
Here's how
df$Chemical_Formula <- gsub("1$", "", df$Chemical_Formula)
The dollar sign after the 1 means end of a string. Meaning it will only remove a 1 if it is located at the end
Upvotes: 3