Reputation: 35
I am reading data from a csv file and one of the columns in the data comes in three different formats:
xxxxx-xxx-xx (5-3-2)
xxxxx-xxxx-x (5-4-1)
xxxx-xxxx-xx (4-4-2)
My goal is to turn these three different styles into one style in the form: xxxxx-xxxx-xx (5-4-2)
In order to make all the different forms the same I need to insert an additional zero at the specific location on each of the 3 different conditions like so:
xxxxx-0xxx-xx
xxxxx-xxxx-0x
0xxxx-xxxx-xx
Anyone have thoughts on the best way to accomplish this?
Upvotes: 2
Views: 526
Reputation: 55695
Slightly shorter and a more functional programming version of Justin's solution
numbers <- c('11111-111-11', '11111-1111-1', '1111-1111-11')
restyle <- function(number, fmt){
tmp <- as.list(as.integer(strsplit(number, '-')[[1]]))
do.call(sprintf, modifyList(tmp, list(fmt = fmt)))
}
sapply(numbers, restyle, fmt = '%05d-%04d-%02d', USE.NAMES = F)
Upvotes: 5
Reputation: 5566
Are you working in a unix like environment? It might be easier to use sed at the command line rather than R's regex functions.
echo "54324-965-23" | sed 's/\(.....\)-\(...\)-\(..\)/\1-0\2-\3/'
will spit back
54324-0965-23
If you want to apply it to the entire file it would look something like
cat file1.txt | sed 's/\(.....\)-\(...\)-\(..\)/\1-0\2-\3/' > file2.txt
And if you have multiple txt changing operations you can pipe them all together
cat file1.txt | sed 's/\(.....\)-\(...\)-\(..\)/\1-0\2-\3/' | sed '2ndthing' | sed 'thirdthing' > file2.txt
Upvotes: 3
Reputation: 43255
I would do this using sprintf
and strsplit
:
x <- c('11111-111-11', '11111-1111-1', '1111-1111-11')
y <- strsplit(x, '-')
myfun <- function(y) {
first <- sprintf('%05d', as.integer(y[1]))
second <- sprintf('%04d', as.integer(y[2]))
third <- sprintf('%02d', as.integer(y[3]))
paste(first, second, third, sep='-')
}
sapply(y, myfun)
# [1] "11111-0111-11" "11111-1111-01" "01111-1111-11"
You could also do this with fancy regular expressions or the gsubfn
package but that may be overkill!
Upvotes: 8
Reputation:
One solution to this is to first remove the hyphens, then just add them back in the desired character location, like so:
> v <- c("01234-567-89","01234-5678-9","0123-4567-89")
> v
[1] "01234-567-89" "01234-5678-9" "0123-4567-89"
> #remove hyphens
> v <- gsub("-","",v)
> v
[1] "0123456789" "0123456789" "0123456789"
> #add hyphens
> paste(substr(v,1,4),substr(v,5,8),substr(v,9,10),sep="-")
[1] "0123-4567-89" "0123-4567-89" "0123-4567-89"
Upvotes: 0