Reputation: 593
I have a dataframe which has a column,
service-id
ids-1-2-3-4-5
ids-1-2-6
ids-5
ids-7-8
with many other columns. I want to split the data ids-1-2-3-4-5 into different columns 1,2,3...8 like one hot encoding ,having columns 1 2 3 4 5 6 7 8 also having 1 and rest 0 if not present.
col.1 col.2 col.3 col.4 col.5 col.6 ..... col.8
1 1 1 1 1 0 ..... 0 for ids-1-2-3-4-5
1 1 0 0 0 1 ...... 0 for ids-1-2-6
I tried tidyverse but it is not helpful.
Upvotes: 1
Views: 116
Reputation: 887671
If we need tidyverse
option, here is a way
library(tidyverse)
df1 %>%
rownames_to_column('rn') %>%
extract(service.id, into = c('id', 'col'), "^([^-]+)-(.*)") %>%
separate_rows(col) %>%
mutate(n = 1, col = paste0("col.", col)) %>%
spread(col, n, fill = 0) %>%
select(-rn, -id)
# col.1 col.2 col.3 col.4 col.5 col.6 col.7 col.8
#1 1 1 1 1 1 0 0 0
#2 1 1 0 0 0 1 0 0
#3 0 0 0 0 1 0 0 0
#4 0 0 0 0 0 0 1 1
df1 <- structure(list(service.id = c("ids-1-2-3-4-5", "ids-1-2-6", "ids-5",
"ids-7-8")), .Names = "service.id", class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 1
Reputation: 5017
A solution using basic R code.
Your data
db<-data.frame("service-id"=c("ids-1-2-3-4-5","ids-1-2-6","ids-5","ids-7-8"))
Identify number of columns
ncol<-max(suppressWarnings(as.numeric(unlist(strsplit(as.character(db$service.id),"-")))),na.rm = T)
Extract numeric id list
number_list<-strsplit(as.character(db$service.id),"-")
number_list<-suppressWarnings(lapply(number_list,as.numeric))
number_list <- lapply(number_list, function(x) x[!is.na(x)])
Create output dataframe
f<-function(x,ncol)
{
return(as.numeric(seq(1:ncol) %in% x))
}
out<-t(data.frame(lapply(number_list, f, ncol=ncol)))
colnames(out)<-paste0("col.",seq(1:ncol))
rownames(out)<-NULL
Your output
out
col.1 col.2 col.3 col.4 col.5 col.6 col.7 col.8
[1,] 1 1 1 1 1 0 0 0
[2,] 1 1 0 0 0 1 0 0
[3,] 0 0 0 0 1 0 0 0
[4,] 0 0 0 0 0 0 1 1
Upvotes: 1