Reputation: 485
I have a data frame, and I want to spli their columns if they contain different strings or words.
I am tryng different methods in R and it is not working
My data frame looks like this:
df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"), y = c("TAP1", "TAP2", "TAP2", "TAP3" ))
And, for example, I am trying with the first column this:
df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"))
df %>% separate(x, c("TAP1", "TAP2"), extra = "drop", fill = "right")
but is not working
I am having the next output:
TAP1 TAP2
1 <NA> <NA>
2 TAP1 <NA>
3 TAP1 <NA>
4 TAP2 <NA>
My expected output is:
TAP1 TAP2
1 <NA> <NA>
2 TAP1 <NA>
3 TAP1 <NA>
4 <NA> TAP2
And I would like to do the same for all columns in the complete data frame where I have different combinations of words like TAP1, TAP2, TAP3 ... etc.
In this example, the final table taking in to account column x and y would be.
df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"), y = c("TAP1", "TAP2", "TAP2", "TAP3" ))
TAP1 TAP2 TAP1.1 TAP2.2 TAP3.3
1 <NA> <NA> TAP1 <NA> <NA>
2 TAP1 <NA> <NA> TAP2 <NA>
3 TAP1 <NA> <NA> TAP2 <NA>
4 <NA> TAP2 <NA> <NA> TAP3
Upvotes: 2
Views: 61
Reputation: 39154
A solution using the tidyverse
and the dummies
package. df3
is the final output.
library(tidyverse)
library(dummies)
df2 <- dummy.data.frame(df) %>% select(-ends_with("NA"))
cols <- str_remove(names(df2), regex("^x|^y"))
df3 <- modify2(df2, cols, ~ifelse(.x == 0, NA, .y))
df3
# xTAP1 xTAP2 yTAP1 yTAP2 yTAP3
# 1 <NA> <NA> TAP1 <NA> <NA>
# 2 TAP1 <NA> <NA> TAP2 <NA>
# 3 TAP1 <NA> <NA> TAP2 <NA>
# 4 <NA> TAP2 <NA> <NA> TAP3
Upvotes: 1
Reputation: 12074
Here's a base solution. This goes through all possible factors in your column (i.e., TAP1
, TAP2
) and checks whereabouts they're present. For places where they are present, it returns the name of the level. For places where they're absent, it returns NA
. Then, I repackage the resulting list into a data frame and rename the columns.
# Original data frame
df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"))
# Repackage
df2 <- data.frame(lapply(levels(df$x), function(x)ifelse(df$x == x, x, NA)))
# Fix names
names(df2) <- levels(df$x)
# Check results
df2
#> TAP1 TAP2
#> 1 <NA> <NA>
#> 2 TAP1 <NA>
#> 3 TAP1 <NA>
#> 4 <NA> TAP2
Created on 2019-05-29 by the reprex package (v0.3.0)
In light of your update:
# Original data frame
df <- data.frame(x = c(NA, "TAP1", "TAP1", "TAP2"),
y = c("TAP1", "TAP2", "TAP2", "TAP3" ))
# Define splitter function
splitter <- function(foo){
tmp <- data.frame(lapply(levels(foo), function(x)ifelse(foo == x, x, NA)))
names(tmp) <- levels(foo)
tmp
}
# Run over data frame and bind together
do.call(cbind, lapply(df, splitter))
#> x.TAP1 x.TAP2 y.TAP1 y.TAP2 y.TAP3
#> 1 <NA> <NA> TAP1 <NA> <NA>
#> 2 TAP1 <NA> <NA> TAP2 <NA>
#> 3 TAP1 <NA> <NA> TAP2 <NA>
#> 4 <NA> TAP2 <NA> <NA> TAP3
Created on 2019-05-29 by the reprex package (v0.3.0)
Same rationale as before, but I define a function that is applied to each column and the results are bound together using do.call
and cbind
.
Upvotes: 1
Reputation: 886938
We can do this with spread
library(tidyverse)
df %>%
mutate(n = row_number()) %>%
group_by(x) %>%
mutate(rn = row_number(), y = x) %>%
spread(y, x) %>%
select(TAP1, TAP2)
# A tibble: 4 x 2
# TAP1 TAP2
# <fct> <fct>
#1 <NA> <NA>
#2 TAP1 <NA>
#3 TAP1 <NA>
#4 <NA> TAP2
With multiple columns, we can gather
and spread
rownames_to_column(df, 'rn') %>%
gather(key, val, -rn) %>%
mutate(val1 = val) %>%
unite(val, val,key) %>%
group_by(val) %>% # not really need for this example
mutate(ind = row_number()) %>% # not needed here though
spread(val, val1) %>%
select(starts_with("TAP"))
# A tibble: 4 x 5
# TAP1_x TAP1_y TAP2_x TAP2_y TAP3_y
# <chr> <chr> <chr> <chr> <chr>
#1 <NA> TAP1 <NA> <NA> <NA>
#2 TAP1 <NA> <NA> TAP2 <NA>
#3 TAP1 <NA> <NA> TAP2 <NA>
#4 <NA> <NA> TAP2 <NA> TAP3
Upvotes: 3