Reputation: 45
I also want to split the same column in the same way . I wanted to do this as following bur it is not working propely.
the code I used is
t38kbat = read.table("test38kbat.txt", header = FALSE)
head(t38kbat)
t38kbat <- separate (t38kbat, V2, c("sp", "id", "gene_organism"), \\"|")
t38kbat <- separate (t38kbat, gene_organism, c("gene", "organism"), \\"_")
t38kbat <- unite (t38kbat, sp, sp, id, sep = "|")
while I run the script I recieved the error
Error: unexpected input in "t38kbat <- separate (t38kbat, V2, c("sp", "id", "gene_organism"), \"
can anybody guide me how to resolve it. Thanks
Upvotes: 3
Views: 165
Reputation: 3282
It seems to me (since there is not a common delimiter to split on) that substring() might be helpful to you. substring() requires a starting and ending position; if this is predictable (and static) the logic would look something like this:
myDataFrame = data.frame(Column2 = "sp|Q10CQ1|MAD14_ORYSJ")
myDataFrame$newCol1 = substring(myDataFrame$Column2,1,10)
myDataFrame$newCol2 = substring(myDataFrame$Column2,11,15)
myDataFrame$newCol3 = substring(myDataFrame$Column2,17,21)
Not overly elegant, and this assumes that the split positions are the same in each value, but hopefully this helps.
Upvotes: 1
Reputation: 7248
In base R, the strsplit
command will operate on a vector of that form, but produces a list, which you will have to simplify further.
In the tidyr
package, there's a separate
function that will preserve the data frame nature of things. That's probably preferable for this use case.
For example
> library(tidyr)
> a <- data.frame(x=1:3, y=c("a|b|c", "b|c|d", "d|e|f"))
> a
x y
1 1 a|b|c
2 2 b|c|d
3 3 d|e|f
> separate(a, y, c("a","b","c"), '\\|')
x a b c
1 1 a b c
2 2 b c d
3 3 d e f
To flesh out the strsplit
solution slightly, you would have to use a somewhat awkward combination of cbinds
to get there
> cbind(a, do.call(cbind, strsplit(as.character(a$y), "\\|")))
x y 1 2 3
1 1 a|b|c a b d
2 2 b|c|d b c e
3 3 d|e|f c d f
EDIT: Also should note that if you use the tidyr
approach, you will have to use it recursively, possibly with unite
, to get the complete behavior. Something like
df <- separate(df, col, c("type", "subtype", "rawclass"), "\\|")
df <- separate(df, rawclass, c("class", "subclass"), "_")
df <- unite(df, sp, type, subtype, sep="|")
Assuming that the original column is called col
, and with made-up names for the final headers.
Upvotes: 2