Reputation: 113

How to remove a specific character from all the rows in one column of a dataframe

I have a dataframe with two columns and a few hundred rows, let's call it df which looks like this -

Name                 Chemical_Formula
PALMITYL-COA         C37H62N7O17P3S1
CPD0-888             C34H52N7O24P2
3-OXOPALMITOYL-COA   C37H60N7O18P3S1
OH-MYRISTOYL         C43H75N3O20P2
CPD-19171            C39H64N7O18P3S1
CPD-15253            C52H99N3O13P2
CPD-12122            C75H112O2
CPD0-937             C149H260N2O78P4
....                 .....
....                 .....

Now if the Chemical_Formula for some of the compounds ends in 1 I want to remove that 1 from the chemical formula. For example for the first compound PALMITYL-COA the chemical formula is C37H62N7O17P3S1 which ends in 1. So in my new dataframe I want the chemical formula for this first compound to be C37H62N7O17P3S.

So, my new dataframe should look like this -

Name                 Chemical_Formula
PALMITYL-COA         C37H62N7O17P3S
CPD0-888             C34H52N7O24P2
3-OXOPALMITOYL-COA   C37H60N7O18P3S
OH-MYRISTOYL         C43H75N3O20P2
CPD-19171            C39H64N7O18P3S
CPD-15253            C52H99N3O13P2
CPD-12122            C75H112O2
CPD0-937             C149H260N2O78P4
....                 .....
....                 .....

I want to keep all the Chemical Formulas as it is if they don't end in the number 1. The ones which end in 1 I just want to remove that 1, keeping the rest of the formula as it is.

I was looking for ways to do this using gsub sub greplorsubset functions but not quite sure what pattern to give using the regular expression rules. Please help!

Upvotes: 1

Answers (2)

RavinderSingh13

Reputation: 133770

Following may help you here. Where I am using sub substitute function of base R to remove 1 if it id at end of the element with NULL.

sub("1$","",df$Chemical_Formula)

To save this output into same column use df$Chemical_Formula <- in above code too.

Explanation of code:

sub: sub is base R's function which works on method of sub(regex_needs_to_be_used_to_replace_present_content,"with_new_content",variable)

"1$": Means telling sub to act upon only those line which are ending with 1 for df's column named Chemical_Formula(which I am explaining further this post)

"": If above match found in any value then replace line's ending 1 with NULL here as per OP's request.

df$Chemical_Formula: data frame named df's column named Chemical_Formula

Upvotes: 2

stevec

Reputation: 52977

Here's how

df$Chemical_Formula <- gsub("1$", "", df$Chemical_Formula)

The dollar sign after the 1 means end of a string. Meaning it will only remove a 1 if it is located at the end

Upvotes: 3

How to remove a specific character from all the rows in one column of a dataframe

Answers (2)

Related Questions