Reputation: 6756
I have a data.frame column that has values such as below. I want to use each cell and create two columns- num1 and num2 such that num1=everything before "-" and num2=everything between "-" and "."
I am thinking of using gregexpr function as shown here and write a for loop to iterate over each row. Is there a faster way to do this?
60-150.PNG
300-12.PNG
employee <- c('60-150.PNG','300-12.PNG')
employ.data <- data.frame(employee)
Upvotes: 3
Views: 521
Reputation: 887088
Try
library(tidyr)
extract(employ.data, employee, into=c('num1', 'num2'),
'([^-]*)-([^.]*)\\..*', convert=TRUE)
# num1 num2
#1 60 150
#2 300 12
Or
library(data.table)#v1.9.5+
setDT(employ.data)[, tstrsplit(employee, '[-.]', type.convert=TRUE)[-3]]
# V1 V2
#1: 60 150
#2: 300 12
Or based on @rawr's comment
read.table(text=gsub('-|.PNG', ' ', employ.data$employee),
col.names=c('num1', 'num2'))
# num1 num2
#1 60 150
#2 300 12
To keep the original column
extract(employ.data, employee, into=c('num1', 'num2'), remove=FALSE,
'([^-]*)-([^.]*)\\..*', convert=TRUE)
# employee num1 num2
#1 60-150.PNG 60 150
#2 300-12.PNG 300 12
Or
setDT(employ.data)[, paste0('num', 1:2) := tstrsplit(employee,
'[-.]', type.convert=TRUE)[-3]]
# employee num1 num2
#1: 60-150.PNG 60 150
#2: 300-12.PNG 300 12
Or
cbind(employ.data, read.table(text=gsub('-|.PNG', ' ',
employ.data$employee),col.names=c('num1', 'num2')))
# employee num1 num2
#1 60-150.PNG 60 150
#2 300-12.PNG 300 12
Upvotes: 5
Reputation: 92282
Another option using stringi
library(stringi)
data.frame(type.convert(stri_split_regex(employee, "[-.]", simplify = TRUE)[, -3]))
# X1 X2
# 1 60 150
# 2 300 12
Upvotes: 3
Reputation: 5951
Or with the simple gsub
.
gsub("-.*", "", employ.data$employee) # substitute everything after - with nothing
gsub(".*-(.*)\\..*", "\\1", employ.data$employee) #keep only anything between - and .
Upvotes: 2
Reputation: 193517
You can try cSplit
from my "splitstackshape" package:
library(splitstackshape)
cSplit(employ.data, "employee", "-|.PNG", fixed = FALSE)
# employee_1 employee_2
# 1: 60 150
# 2: 300 12
Since you mention gregexpr
, you can probably try something like:
do.call(rbind,
regmatches(as.character(employ.data$employee),
gregexpr("-|.PNG", employ.data$employee),
invert = TRUE))[, -3]
[,1] [,2]
[1,] "60" "150"
[2,] "300" "12"
Upvotes: 3
Reputation: 4251
The strsplit
function will give you what you're looking for, output to a list.
employee <- c('60-150.PNG','300-12.PNG')
strsplit(employee, "[-]")
##Output:
[[1]]
[1] "60" "150.PNG"
[[2]]
[1] "300" "12.PNG"
Note the second argument to strsplit
is a regex value, not just a character to split on, so more complicated regexp can be used.
Upvotes: 1