Eric González
Eric González

Reputation: 485

Separate columns by their strings if they are different

I have a data frame, and I want to spli their columns if they contain different strings or words.

I am tryng different methods in R and it is not working

My data frame looks like this:

df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"), y = c("TAP1", "TAP2", "TAP2", "TAP3" ))

And, for example, I am trying with the first column this:

df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"))
df %>% separate(x, c("TAP1", "TAP2"), extra = "drop", fill = "right")

but is not working

I am having the next output:

TAP1 TAP2
1 <NA> <NA>
2 TAP1 <NA>
3 TAP1 <NA>
4 TAP2 <NA>

My expected output is:

 TAP1 TAP2
1 <NA> <NA>
2 TAP1 <NA>
3 TAP1 <NA>
4 <NA> TAP2

And I would like to do the same for all columns in the complete data frame where I have different combinations of words like TAP1, TAP2, TAP3 ... etc.

In this example, the final table taking in to account column x and y would be.

 df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"), y = c("TAP1", "TAP2",   "TAP2", "TAP3" ))

  TAP1 TAP2 TAP1.1 TAP2.2 TAP3.3
1 <NA> <NA> TAP1   <NA>   <NA>
2 TAP1 <NA> <NA>   TAP2   <NA>
3 TAP1 <NA> <NA>   TAP2   <NA>
4 <NA> TAP2 <NA>   <NA>   TAP3

Upvotes: 2

Views: 61

Answers (3)

www
www

Reputation: 39154

A solution using the tidyverse and the dummies package. df3 is the final output.

library(tidyverse)
library(dummies)

df2 <- dummy.data.frame(df) %>% select(-ends_with("NA"))

cols <- str_remove(names(df2), regex("^x|^y"))

df3 <- modify2(df2, cols, ~ifelse(.x == 0, NA, .y))

df3
#   xTAP1 xTAP2 yTAP1 yTAP2 yTAP3
# 1  <NA>  <NA>  TAP1  <NA>  <NA>
# 2  TAP1  <NA>  <NA>  TAP2  <NA>
# 3  TAP1  <NA>  <NA>  TAP2  <NA>
# 4  <NA>  TAP2  <NA>  <NA>  TAP3

Upvotes: 1

Dan
Dan

Reputation: 12074

Here's a base solution. This goes through all possible factors in your column (i.e., TAP1, TAP2) and checks whereabouts they're present. For places where they are present, it returns the name of the level. For places where they're absent, it returns NA. Then, I repackage the resulting list into a data frame and rename the columns.

# Original data frame
df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"))

# Repackage
df2 <- data.frame(lapply(levels(df$x), function(x)ifelse(df$x == x, x, NA)))

# Fix names
names(df2) <- levels(df$x)

# Check results
df2
#>   TAP1 TAP2
#> 1 <NA> <NA>
#> 2 TAP1 <NA>
#> 3 TAP1 <NA>
#> 4 <NA> TAP2

Created on 2019-05-29 by the reprex package (v0.3.0)


In light of your update:

# Original data frame
df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"), 
                 y = c("TAP1", "TAP2",   "TAP2", "TAP3" ))

# Define splitter function
splitter <- function(foo){
  tmp <- data.frame(lapply(levels(foo), function(x)ifelse(foo == x, x, NA)))
  names(tmp) <- levels(foo)
  tmp
}

# Run over data frame and bind together
do.call(cbind, lapply(df, splitter))
#>   x.TAP1 x.TAP2 y.TAP1 y.TAP2 y.TAP3
#> 1   <NA>   <NA>   TAP1   <NA>   <NA>
#> 2   TAP1   <NA>   <NA>   TAP2   <NA>
#> 3   TAP1   <NA>   <NA>   TAP2   <NA>
#> 4   <NA>   TAP2   <NA>   <NA>   TAP3

Created on 2019-05-29 by the reprex package (v0.3.0)

Same rationale as before, but I define a function that is applied to each column and the results are bound together using do.call and cbind.

Upvotes: 1

akrun
akrun

Reputation: 886938

We can do this with spread

library(tidyverse)
df %>% 
   mutate(n = row_number()) %>% 
   group_by(x) %>% 
   mutate(rn = row_number(), y = x) %>%
   spread(y, x) %>% 
   select(TAP1, TAP2)
# A tibble: 4 x 2
#  TAP1  TAP2 
#  <fct> <fct>
#1 <NA>  <NA> 
#2 TAP1  <NA> 
#3 TAP1  <NA> 
#4 <NA>  TAP2 

With multiple columns, we can gather and spread

rownames_to_column(df, 'rn') %>%
   gather(key, val, -rn) %>%
   mutate(val1 = val) %>% 
   unite(val, val,key) %>% 
   group_by(val) %>%    # not really need for this example
   mutate(ind = row_number()) %>% # not needed here though
   spread(val, val1) %>%
   select(starts_with("TAP"))
# A tibble: 4 x 5
# TAP1_x TAP1_y TAP2_x TAP2_y TAP3_y
#  <chr>  <chr>  <chr>  <chr>  <chr> 
#1 <NA>   TAP1   <NA>   <NA>   <NA>  
#2 TAP1   <NA>   <NA>   TAP2   <NA>  
#3 TAP1   <NA>   <NA>   TAP2   <NA>  
#4 <NA>   <NA>   TAP2   <NA>   TAP3  

Upvotes: 3

Related Questions