Ally Kat
Ally Kat

Reputation: 211

Removing everything after a character in a column in R

I need to remove everything after the question mark in a column.

I have a data set EX:

my.data

BABY      MOM      LANDING
mark      dina     www.example.com/?kdvhzkajvkadjf
tom       becky    www.example.com/?ghkadkho[qeu
brad      tina     www.example.com/?klsdfngal;j

I want my new data to be:

new.data

BABY      MOM      LANDING
mark      dina     www.example.com/?
tom       becky    www.example.com/?
brad      tina     www.example.com/?

How do I tell R to remove everything after the ? in my.data$LANDING ?

Upvotes: 8

Views: 15169

Answers (1)

akrun
akrun

Reputation: 887118

We can use sub to remove the characters that are after ?. We use a positive lookbehind ((?<=\\?).*) to match one or more character (.) that is preceded by ? and replace it with ''.

 my.data$LANDING <- sub('(?<=\\?).*$', '', my.data$LANDING, perl=TRUE)
 my.data
 #  BABY   MOM       LANDING
 #1 mark  dina www.example.com/?
 #2  tom becky www.example.com/?
 #3 brad  tina www.example.com/?

Or another option would be to use capture groups and then replace the second argument with the capture group (\\1).

 my.data$LANDING <- sub('([^?]+\\?).*', '\\1', my.data$LANDING)

Here, we match all characters that are not ? ([^?]+) followed by ? (\\?) and use parentheses to capture as a group (([^?]+\\?)), and we leave the rest of characters not in the group (.*).

Or as @Frank mentioned in the comments, we can match the ? and the rest of the characters (.*), and replace it by \\? as the second argument.

  my.data$LANDING <- sub("\\?.*","\\?",my.data$LANDING)

Upvotes: 13

Related Questions