TCFP HCDG
TCFP HCDG

Reputation: 45

how to split the column in R?

I also want to split the same column in the same way . I wanted to do this as following bur it is not working propely.

the code I used is t38kbat = read.table("test38kbat.txt", header = FALSE) head(t38kbat)

t38kbat <- separate (t38kbat, V2, c("sp", "id", "gene_organism"), \\"|") t38kbat <- separate (t38kbat, gene_organism, c("gene", "organism"), \\"_") t38kbat <- unite (t38kbat, sp, sp, id, sep = "|")

while I run the script I recieved the error

Error: unexpected input in "t38kbat <- separate (t38kbat, V2, c("sp", "id", "gene_organism"), \"

can anybody guide me how to resolve it. Thanks

Upvotes: 3

Views: 165

Answers (2)

Sevyns
Sevyns

Reputation: 3282

It seems to me (since there is not a common delimiter to split on) that substring() might be helpful to you. substring() requires a starting and ending position; if this is predictable (and static) the logic would look something like this:

myDataFrame = data.frame(Column2 = "sp|Q10CQ1|MAD14_ORYSJ")
myDataFrame$newCol1 = substring(myDataFrame$Column2,1,10)
myDataFrame$newCol2 = substring(myDataFrame$Column2,11,15)
myDataFrame$newCol3 = substring(myDataFrame$Column2,17,21)

Not overly elegant, and this assumes that the split positions are the same in each value, but hopefully this helps.

Upvotes: 1

user295691
user295691

Reputation: 7248

In base R, the strsplit command will operate on a vector of that form, but produces a list, which you will have to simplify further.

In the tidyr package, there's a separate function that will preserve the data frame nature of things. That's probably preferable for this use case.

For example

> library(tidyr)
> a <- data.frame(x=1:3, y=c("a|b|c", "b|c|d", "d|e|f"))
> a
  x     y
1 1 a|b|c
2 2 b|c|d
3 3 d|e|f
> separate(a, y, c("a","b","c"), '\\|')
  x a b c
1 1 a b c
2 2 b c d
3 3 d e f

To flesh out the strsplit solution slightly, you would have to use a somewhat awkward combination of cbinds to get there

> cbind(a, do.call(cbind, strsplit(as.character(a$y), "\\|")))
  x     y 1 2 3
1 1 a|b|c a b d
2 2 b|c|d b c e
3 3 d|e|f c d f

EDIT: Also should note that if you use the tidyr approach, you will have to use it recursively, possibly with unite, to get the complete behavior. Something like

df <- separate(df, col, c("type", "subtype", "rawclass"), "\\|")
df <- separate(df, rawclass, c("class", "subclass"), "_")
df <- unite(df, sp, type, subtype, sep="|")

Assuming that the original column is called col, and with made-up names for the final headers.

Upvotes: 2

Related Questions