codemachino
codemachino

Reputation: 33

Removing all characters in a variable after a specific character in r

I have a dataset df1 like so:

snp <- c("rs7513574_T", "rs1627238_A", "rs1171278_C")
p.value <- c(2.635489e-01, 9.836280e-01 , 6.315047e-01  )

df1 <- data.frame(snp, p.value)

I want to remove the _ underscore and the letters after it (representing allele) in df1 and make this into a new dataframe df2

I tried this using the code

df2 <- df1[,c("snp", "allele"):=tstrsplit(`snp`, "_", fixed = TRUE)]

However, this changes the df1 data frame. Is there another way to do this?

Upvotes: 0

Views: 669

Answers (4)

akrun
akrun

Reputation: 887048

Consider creating a copy of the dataset and do the tstrsplit on the copied data to avoid changes in original data

library(data.table)
df2 <- copy(df1)
setDT(df2)[,c("snp", "allele") := tstrsplit(snp, "_", fixed = TRUE)]

Upvotes: 0

Gregor Thomas
Gregor Thomas

Reputation: 145765

This is my best guess as to what you want:

library(tidyr)
separate(df1, snp, into = c("snp", "allele"), sep = "_")
#         snp allele   p.value
# 1 rs7513574      T 0.2635489
# 2 rs1627238      A 0.9836280
# 3 rs1171278      C 0.6315047

Upvotes: 1

Marcos P&#233;rez
Marcos P&#233;rez

Reputation: 1250

Try:

df2 <- df1 %>% mutate(snp=gsub("_.","",snp))

Upvotes: 0

user438383
user438383

Reputation: 6206

df2 = df1 %>% 
    dplyr::mutate(across(c(V1, V2, V3), ~stringr::str_remove_all(., "_[:alpha:]")))
> df2
               V1        V2        V3
snp     rs7513574 rs1627238 rs1171278
p.value 0.2635489  0.983628 0.6315047

Upvotes: 0

Related Questions