How to tidy up a character column?

Question

What I have:

test_df <- data.frame(isolate=c(1,2,3,4,1,2,3,4,5),label=c(1,1,1,1,2,2,2,2,2),alignment=c("--at","at--","--at","--at","a--","acg","a--","a--", "agg"))

> test_df
  isolate label alignment
1       1     1   --at
2       2     1   at--
3       3     1   --at
4       4     1   --at
5       1     2   a--
6       2     2   acg
7       3     2   a--
8       4     2   a--
9       5     2   agg

What I want:

I'd like to explode the alignment field into two columns, position and character:

> test_df
  isolate label aln_pos  aln_char
1       1     1       1  -
2       1     1       2  -
3       1     1       3  a
4       1     1       4  t
...

Not all alignments are the same length, but all alignments with the same label have the same length.

What I've tried:

I was thinking I could use separate to first make each position have its own column, then use gather turn those columns into key value pairs. However, I haven't been able to get the separate part right.

Seth Wenchel · Accepted Answer

Since you mentioned tidyr::gather, you could try this:

test_df <- data.frame(isolate=c(1,2,3,4,1,2,3,4,5),
                      label=c(1,1,1,1,2,2,2,2,2),
                      alignment=c("--at","at--","--at","--at","a--","acg","a--","a--", "agg"), 
                      stringsAsFactors = FALSE)

library(tidyverse)

test_df %>% 
  mutate(alignment = strsplit(alignment,"")) %>% 
  unnest(alignment)

How to tidy up a character column?

Answers (2)

Related Questions