user2439887
user2439887

Reputation: 61

R - Remove commas from values in a column and place separated values into new rows

I have a column of gene symbols that I have retrieved directly from a database, and some of the rows contain two or more symbols which are comma separated (see example below).

SLC6A13
ATP5J2-PTCD1,BUD31,PTCD1
ACOT7
BUD31,PDAP1
TTC26

I would like to remove the commas, and place the separated symbols into new rows like so:

SLC6A13
ATP5J2-PTCD1
BUD31
PTCD1
ACOT7
BUD3
PDAP1
TTC26

I haven't been able to find a straight forward way to do this in R, does anyone have any suggestions?

Upvotes: 2

Views: 1376

Answers (2)

agstudy
agstudy

Reputation: 121568

Another option is to use readLines and strsplit :

unlist(strsplit(readLines(textConnection(txt)),','))
 "SLC6A13"      "ATP5J2-PTCD1" "BUD31"        "PTCD1"        "ACOT7"        
 "BUD31"        "PDAP1"        "TTC26"  

Upvotes: 1

IRTFM
IRTFM

Reputation: 263342

You can use this vector result to put into a matrix or a data.frame:

vec <- scan(text="SLC6A13
 ATP5J2-PTCD1,BUD31,PTCD1
 ACOT7
 BUD31,PDAP1
 TTC26", what=character(), sep=",")
Read 8 items
 vec
[1] "SLC6A13"      "ATP5J2-PTCD1" "BUD31"        "PTCD1"        "ACOT7"        "BUD31"        "PDAP1"       
[8] "TTC26"       

Perhaps:

 as.matrix(vec)

(The scan function can also read from files. The "text" parameter was only added relatively recently, but it saves typing file=textConnection("...").)

Upvotes: 4

Related Questions