Shubh
Shubh

Reputation: 593

How to split data in each records in R?

I have a dataframe which has a column,

service-id       
ids-1-2-3-4-5
ids-1-2-6
ids-5
ids-7-8

with many other columns. I want to split the data ids-1-2-3-4-5 into different columns 1,2,3...8 like one hot encoding ,having columns 1 2 3 4 5 6 7 8 also having 1 and rest 0 if not present.

col.1 col.2 col.3 col.4 col.5 col.6   ..... col.8
1     1     1     1       1    0      .....  0          for ids-1-2-3-4-5
1     1     0     0       0    1      ...... 0          for ids-1-2-6

I tried tidyverse but it is not helpful.

Upvotes: 1

Views: 116

Answers (2)

akrun
akrun

Reputation: 887671

If we need tidyverse option, here is a way

library(tidyverse)
df1 %>%
   rownames_to_column('rn') %>% 
   extract(service.id, into = c('id', 'col'), "^([^-]+)-(.*)") %>% 
   separate_rows(col) %>%
   mutate(n = 1, col = paste0("col.", col)) %>% 
   spread(col, n, fill = 0) %>%
   select(-rn, -id)
#  col.1 col.2 col.3 col.4 col.5 col.6 col.7 col.8
#1     1     1     1     1     1     0     0     0
#2     1     1     0     0     0     1     0     0
#3     0     0     0     0     1     0     0     0
#4     0     0     0     0     0     0     1     1

data

df1 <- structure(list(service.id = c("ids-1-2-3-4-5", "ids-1-2-6", "ids-5", 
 "ids-7-8")), .Names = "service.id", class = "data.frame", row.names = c(NA, 
 -4L))

Upvotes: 1

Terru_theTerror
Terru_theTerror

Reputation: 5017

A solution using basic R code.

Your data

db<-data.frame("service-id"=c("ids-1-2-3-4-5","ids-1-2-6","ids-5","ids-7-8"))

Identify number of columns

ncol<-max(suppressWarnings(as.numeric(unlist(strsplit(as.character(db$service.id),"-")))),na.rm = T)

Extract numeric id list

number_list<-strsplit(as.character(db$service.id),"-")
number_list<-suppressWarnings(lapply(number_list,as.numeric))
number_list <- lapply(number_list, function(x) x[!is.na(x)])

Create output dataframe

f<-function(x,ncol)
{
    return(as.numeric(seq(1:ncol) %in% x))
}
out<-t(data.frame(lapply(number_list, f, ncol=ncol)))
colnames(out)<-paste0("col.",seq(1:ncol))
rownames(out)<-NULL

Your output

out
     col.1 col.2 col.3 col.4 col.5 col.6 col.7 col.8
[1,]     1     1     1     1     1     0     0     0
[2,]     1     1     0     0     0     1     0     0
[3,]     0     0     0     0     1     0     0     0
[4,]     0     0     0     0     0     0     1     1

Upvotes: 1

Related Questions