Reputation: 497

Splitting rows into columns in R using tidyr

I have a dataset which looks like this-

                                                                              col1

1 ATOM      1  N   ILE A  12      67.611  47.640  52.312  1.00 12.44           N  
2 ATOM      2  CA  ILE A  12      66.381  47.660  51.520  1.00 25.25           C

It has a single column called col1. I want to separate into 12 columns for which I'm using the following command-

try=separate(subset,col1,c("name","S.No","Atom Name","Residue Name","Symbol","Residue Number","X-cor","Y-cor","Z-cor","Uk1","Uk2","Symbol"), sep= " ")

But I keep on getting the following error, which I do not understand-

Warning message: Too many values at 3929 locations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...

And it gives me the following output-

name S.No Atom Name Residue Name Symbol Residue Number X-cor Y-cor Z-cor Uk1 Uk2 Symbol

1 ATOM                                                       1           N            ILE

2 ATOM                                                       2          CA     ILE      A

Any help fixing this is highly appreciated. Thanks!

Upvotes: 3

Answers (2)

Jeevan gona

Reputation: 1

I have faced the same problem

solution:- Dont use "sep" if you want to divide two characters(or anything) connected by "."

reference: check examples provided in documentation of separate()

> df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
> df %>% separate(x, c("A", "B"))
  A    B
  1 <NA> <NA>
  2    a    b
  3    a    d
  4    b    c

#Reason for warning:

> x="Sepal.Width"
> strsplit(x,split=".")
[[1]]
[1] "" "" "" "" "" "" "" "" "" "" ""

> str_detect(x,".")
[1] TRUE
> str_replace(x,".","_")
[1] "_epal.Width"
> str_replace_all(x,".","_")
[1] "___________"

Upvotes: 0

dimitris_ps

Reputation: 5951

There should be a more elegant solution with tidyr. But without that library this is what I have

data.frame(do.call(rbind, unlist(apply(subset, 1, function(x) strsplit(x, "\\s+")),recursive=FALSE)))

Logic

I am assuming your data set name is subset. For each row of the data.frame you split it up by the space(s), which is this part strsplit(x, "\\s+")). The rest is basically to have it in a data.frame.

Update

Just figured it out, in your code just replace sep= " " with sep= "\\s+". the \\s+ states at least on space, whereas your " " is exactly one space.

Upvotes: 4

Splitting rows into columns in R using tidyr

Answers (2)

Logic

Update

Related Questions