Peter Chung
Peter Chung

Reputation: 1122

R remove string before delimiter

I have one of the column in the dataframe and I would like to remove part of the string before the 5th delimiter "." and the last "." for .txt and I don't know how to do it.

Input:

jhu-usc.edu_GBM.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1481-05.txt
jhu-usc.edu_BCD.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1482-05.txt
jhu-usc.edu_LGG.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1483-05.txt
jhu-usc.edu_LUAD.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1484-05.txt
jhu-usc.edu_LUAD.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1485-05.txt
jhu-usc.edu_BRCA.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1486-05.txt
jhu-usc.edu_GBM.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1487-05.txt
jhu-usc.edu_PRCA.HumanMethylation450.6.lvl-3.TCGA-06-5415-01A-01D-1488-05.txt

Desired output:

TCGA-06-5415-01A-01D-1481-05
TCGA-06-5415-01A-01D-1482-05
TCGA-06-5415-01A-01D-1483-05
TCGA-06-5415-01A-01D-1484-05
TCGA-06-5415-01A-01D-1485-05
TCGA-06-5415-01A-01D-1486-05
TCGA-06-5415-01A-01D-1487-05
TCGA-06-5415-01A-01D-1488-05

I tried: sapply(strsplit(as.character(df$V1), "."), '[', 1:5)

Please advice. Thank you.

Upvotes: 0

Views: 548

Answers (2)

Andrew Gustar
Andrew Gustar

Reputation: 18425

If they all end with .txt then you could do

sub(".+\\.([^.]+).txt", "\\1", as.character(df$V1))

Upvotes: 1

akrun
akrun

Reputation: 887028

Assuming that the text is fixed

sub(".*(TCGA[^.]+).*", "\\1", str1)

Upvotes: 1

Related Questions