Peter Chung
Peter Chung

Reputation: 1122

R partial gsub in a column

I have a question on how can I gsub partially for the 1st column of the df. I can remove all the strings after the colon, but I want to keep the info at the rows starting with 19.

df$V1:

rs1231243:G:T:0
rs483294:C:T:5098723
19:4783234:T:G
rs19873423:A:C
19:83947355:C:T
kpg897324
rs3287492:G:C

Desired output:

rs1231243
rs483294
19:4783234:T:G
rs19873423
19:83947355:C:T
kpg897324
rs3287492

code:
df$V1 <- gsub("\\:.*","",df$V1)

I don't know how to gsub conditionally, or other method to do it. Please advice. Thanks.

Upvotes: 3

Views: 191

Answers (2)

Jan
Jan

Reputation: 43169

You can use a neg. lookahead:

gsub("^(?!19)([^:]+).*", "\\1", df$V1, perl = T)

See a demo on regex101.com.


This yields for

df["V2"] <- gsub("^(?!19)([^:]+).*", "\\1", df$V1, perl = T)
df
                    V1              V2
1      rs1231243:G:T:0       rs1231243
2 rs483294:C:T:5098723        rs483294
3       19:4783234:T:G  19:4783234:T:G
4       rs19873423:A:C      rs19873423
5      19:83947355:C:T 19:83947355:C:T
6            kpg897324       kpg897324
7        rs3287492:G:C       rs3287492

Upvotes: 4

RAVI PIPALIYA
RAVI PIPALIYA

Reputation: 21

Since you want to condition based on each value of the vector you can use the ifelse function

ifelse(test, yes, no)

Arguments

test - an object which can be coerced to logical mode.

yes - return values for true elements of the test.

no - return values for false elements of the test.

The code below should

df$V1 <- ifelse(grepl("^19",df$V1), # Test
                df$V1, # yes
                gsub("\\:.*","",df$V1)) # No

Upvotes: 1

Related Questions