Raghav Mehta
Raghav Mehta

Reputation: 17

String Splitting a column into multiple columns

Hi I have question regarding R Programming, I am a newbie in R. I have a dataset in excel with a particular column having values as such.

123456
123456789
123456789123

Now my requirement is to get values in multiples of 3 and split into different columns. For eg. My first row would be splitting into 2 columns and second row into 3 columns

colA colB colC
123  456
123  456  789

The desired output:

enter link description here

Upvotes: 1

Views: 169

Answers (4)

G. Grothendieck
G. Grothendieck

Reputation: 269566

Here are a few solutions. The first 5 do not use any packages. nc (number of columns) and cn (column names) defined in (1) are used in the others as well.

1) read.fwf Using the input DF shown reproducibly in the Note at the end count the maximum number of characters in a line and divide by 3 to get the number of columns nc. Next compute the column names cn. Finally use read.fwf to read them in. No packages are used.

nc <- max(nchar(DF[[1]]))/3
cn <- paste0("col", head(LETTERS, nc))
read.fwf(textConnection(as.character(DF[[1]])), rep(3, length = nc),
   col.names = cn)

giving:

  colA colB colC colD
1  123  456   NA   NA
2  123  456  789   NA
3  123  456  789  123

2) formatC A variation on the above would be to use formatC to insert commas after every 3 characters giving the character vector ch and then read that in using read.csv.

ch <- formatC(DF[[1]], format= "f", digits = 0, big.mark = ",")
read.csv(text = ch, header = FALSE, col.names = cn)

3) strsplit Another variation would be to split the column using strsplit and the indicated regular expression to split by and then use toString to put the split components into a comma separated string vector, ch. Finally use read.csv as before.

ch <- sapply(strsplit(as.character(DF[[1]]), "(?<=...)", perl = TRUE), toString)
read.csv(text = ch, header = FALSE, col.names = cn)

4) gsub Yet another variation is to use gsub to insert commas every 3 characters and then use read.csv as in (2) and (3).

ch <- gsub("(...)(?=.)", "\\1,", DF[[1]], perl = TRUE)
read.csv(text = ch, header = FALSE, col.names = cn)

5) strcapture This one does not use any read.* routine. It also uses only base R.

strcapture(strrep("(...)?", nc), DF[[1]], setNames(double(nc), cn))

6) strapplyc This is the only variation that uses a package. strapplyc can be used to pick off successive 3 character subsets. It uses a simpler regular expression than some of our other solutions. read.csv is used as in some of the other solutions.

library(gsubfn)

ch <- sapply(strapplyc(DF[[1]], "..."), toString)
read.csv(text = ch, header = FALSE, col.names = cn)

Note

The input in reproducible form:

Lines <- "
123456
123456789
123456789123"
DF <- read.table(text = Lines)

Upvotes: 4

Andre Elrico
Andre Elrico

Reputation: 11480

using library data.table

library(data.table)
setDT(df1)
df1[, tstrsplit(df1$col1, "(?:.{3}+\\K)", perl = TRUE)] # change {3} to other numbers if you don't want to split after every 3.

#    V1  V2   V3   V4
#1: 123 456 <NA> <NA>
#2: 123 456  789 <NA>
#3: 123 456  789  123

data:

df1<-
structure(list(col1 = c("123456", "123456789", "123456789123"
)), class = c("data.table", "data.frame"), row.names = c(NA, -3L))

Upvotes: 1

akrun
akrun

Reputation: 887108

Here is one option with separate

library(tidyverse)
df %>% 
  separate(a, into = c('b', 'c', 'd'), sep= c(3, 6), remove = FALSE)
#          a   b   c   d
#1       123 123        
#2    123456 123 456    
#3 123456789 123 456 789

Using convert=TRUE, changes the type of the column automatically

df %>% 
   separate(a, into = c('b', 'c', 'd'), sep= c(3, 6), 
          remove = FALSE, convert = TRUE)

data

df <- data.frame (a = c(123,123456,123456789))

Upvotes: 3

NColl
NColl

Reputation: 757

There's probably a method that involves less repetition but one option may be

library(tidyverse)

df <- data.frame (a = c(123,123456,123456789))

df %>%
  mutate(b = substr(a, 0,3),
         c = substr(a, 4,6),
         d = substr(a, 7,9))

          a   b   c   d
1       123 123        
2    123456 123 456    
3 123456789 123 456 789

Upvotes: 0

Related Questions