Reputation: 497
I have a dataset which looks like this-
col1 1 ATOM 1 N ILE A 12 67.611 47.640 52.312 1.00 12.44 N 2 ATOM 2 CA ILE A 12 66.381 47.660 51.520 1.00 25.25 C
It has a single column called col1. I want to separate into 12 columns for which I'm using the following command-
try=separate(subset,col1,c("name","S.No","Atom Name","Residue Name","Symbol","Residue Number","X-cor","Y-cor","Z-cor","Uk1","Uk2","Symbol"), sep= " ")
But I keep on getting the following error, which I do not understand-
Warning message: Too many values at 3929 locations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...
And it gives me the following output-
name S.No Atom Name Residue Name Symbol Residue Number X-cor Y-cor Z-cor Uk1 Uk2 Symbol 1 ATOM 1 N ILE 2 ATOM 2 CA ILE A
Any help fixing this is highly appreciated. Thanks!
Upvotes: 3
Views: 3822
Reputation: 1
I have faced the same problem
solution:- Dont use "sep" if you want to divide two characters(or anything) connected by "."
reference: check examples provided in documentation of separate()
> df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
> df %>% separate(x, c("A", "B"))
A B
1 <NA> <NA>
2 a b
3 a d
4 b c
#Reason for warning:
> x="Sepal.Width"
> strsplit(x,split=".")
[[1]]
[1] "" "" "" "" "" "" "" "" "" "" ""
> str_detect(x,".")
[1] TRUE
> str_replace(x,".","_")
[1] "_epal.Width"
> str_replace_all(x,".","_")
[1] "___________"
Upvotes: 0
Reputation: 5951
There should be a more elegant solution with tidyr
. But without that library this is what I have
data.frame(do.call(rbind, unlist(apply(subset, 1, function(x) strsplit(x, "\\s+")),recursive=FALSE)))
I am assuming your data set name is subset
. For each row of the data.frame you split it up by the space(s), which is this part strsplit(x, "\\s+"))
. The rest is basically to have it in a data.frame.
Just figured it out, in your code just replace sep= " "
with sep= "\\s+"
.
the \\s+
states at least on space, whereas your " "
is exactly one space.
Upvotes: 4