Reputation: 17
Hi I have question regarding R Programming, I am a newbie in R. I have a dataset in excel with a particular column having values as such.
123456
123456789
123456789123
Now my requirement is to get values in multiples of 3 and split into different columns. For eg. My first row would be splitting into 2 columns and second row into 3 columns
colA colB colC
123 456
123 456 789
The desired output:
Upvotes: 1
Views: 169
Reputation: 269566
Here are a few solutions. The first 5 do not use any packages. nc
(number of columns) and cn
(column names) defined in (1) are used in the others as well.
1) read.fwf Using the input DF
shown reproducibly in the Note at the end count the maximum number of characters in a line and divide by 3 to get the number of columns nc
. Next compute the column names cn
. Finally use read.fwf
to read them in. No packages are used.
nc <- max(nchar(DF[[1]]))/3
cn <- paste0("col", head(LETTERS, nc))
read.fwf(textConnection(as.character(DF[[1]])), rep(3, length = nc),
col.names = cn)
giving:
colA colB colC colD
1 123 456 NA NA
2 123 456 789 NA
3 123 456 789 123
2) formatC A variation on the above would be to use formatC
to insert commas after every 3 characters giving the character vector ch
and then read that in using read.csv
.
ch <- formatC(DF[[1]], format= "f", digits = 0, big.mark = ",")
read.csv(text = ch, header = FALSE, col.names = cn)
3) strsplit Another variation would be to split the column using strsplit
and the indicated regular expression to split by and then use toString to put the split components into a comma separated string vector, ch
. Finally use read.csv
as before.
ch <- sapply(strsplit(as.character(DF[[1]]), "(?<=...)", perl = TRUE), toString)
read.csv(text = ch, header = FALSE, col.names = cn)
4) gsub Yet another variation is to use gsub
to insert commas every 3 characters and then use read.csv
as in (2) and (3).
ch <- gsub("(...)(?=.)", "\\1,", DF[[1]], perl = TRUE)
read.csv(text = ch, header = FALSE, col.names = cn)
5) strcapture This one does not use any read.* routine. It also uses only base R.
strcapture(strrep("(...)?", nc), DF[[1]], setNames(double(nc), cn))
6) strapplyc This is the only variation that uses a package. strapplyc
can be used to pick off successive 3 character subsets. It uses a simpler regular expression than some of our other solutions. read.csv
is used as in some of the other solutions.
library(gsubfn)
ch <- sapply(strapplyc(DF[[1]], "..."), toString)
read.csv(text = ch, header = FALSE, col.names = cn)
The input in reproducible form:
Lines <- "
123456
123456789
123456789123"
DF <- read.table(text = Lines)
Upvotes: 4
Reputation: 11480
using library data.table
library(data.table)
setDT(df1)
df1[, tstrsplit(df1$col1, "(?:.{3}+\\K)", perl = TRUE)] # change {3} to other numbers if you don't want to split after every 3.
# V1 V2 V3 V4
#1: 123 456 <NA> <NA>
#2: 123 456 789 <NA>
#3: 123 456 789 123
data:
df1<-
structure(list(col1 = c("123456", "123456789", "123456789123"
)), class = c("data.table", "data.frame"), row.names = c(NA, -3L))
Upvotes: 1
Reputation: 887108
Here is one option with separate
library(tidyverse)
df %>%
separate(a, into = c('b', 'c', 'd'), sep= c(3, 6), remove = FALSE)
# a b c d
#1 123 123
#2 123456 123 456
#3 123456789 123 456 789
Using convert=TRUE
, changes the type
of the column automatically
df %>%
separate(a, into = c('b', 'c', 'd'), sep= c(3, 6),
remove = FALSE, convert = TRUE)
df <- data.frame (a = c(123,123456,123456789))
Upvotes: 3
Reputation: 757
There's probably a method that involves less repetition but one option may be
library(tidyverse)
df <- data.frame (a = c(123,123456,123456789))
df %>%
mutate(b = substr(a, 0,3),
c = substr(a, 4,6),
d = substr(a, 7,9))
a b c d
1 123 123
2 123456 123 456
3 123456789 123 456 789
Upvotes: 0