Reputation: 199
Very simple problem in SAS, not so clear to me in R (beginner).
ID <- c('001','002','003')
name1 <- c('ZZ: John','YY: Pete','UU: Judy')
name2 <- c('55: Smith','78: Philips','99: Cortes')
name3 <- c('BB: Jr.','CC: Mr.','56: Dr.')
customer.data <- data.frame(ID, name1, name2, name3)
I want to delete the first 4 characters (including space) from each variable such the output looks like this:
ID name1 name2 name3
001 John Smith Jr.
002 Pete Philips Mr.
003 Judy Cortes Dr.
.... I need to do this over a long list of variables (not just 3, as in my example). The same substring function over and over again, then rewrite the data frame as shown.
I could accomplish this easily in SAS (my legacy program / trying to get away from)
ARRAY FIRSTSTUFF (3) name1 name2 name3;
ARRAY OUTPUTSTUFF (3) name1 name2 name3;
do i=1 to 3;
FORMAT OUTPUTSTUFF(i) $10.;
OUTPUTSTUFF(i)=substring(FIRSTSTUFF(i),5,10);
end;
I am baffled by the R approach to this. Any help is appreciated.
Upvotes: 0
Views: 562
Reputation: 887981
We loop through the columns of 'customer.data' except the first one (customer.data[,-1]
) using lapply
, extract the substring from 5th character to the last character of the string using substr
, and assign the output back to the corresponding columns of the dataset.
customer.data[,-1] <- lapply(customer.data[,-1],
function(x) substr(x,5,nchar(as.character(x))))
customer.data
# ID name1 name2 name3
#1 001 John Smith Jr.
#2 002 Pete Philips Mr.
#3 003 Judy Cortes Dr.
Or in the above you don't have to specify the stop
if you are using substring
instead of substr
(as @Richard Scriven showed in the comments)
customer.data[,-1] <- lapply(customer.data[-1], substring, 5)
Or you could use gsub
to match the characters (.*
- 0 or more characters) from the beginning till the :
followed by white space +
, and replace it with ''
as the second argument for each columns looped by lapply
.
customer.data[,-1] <- lapply(customer.data[,-1], function(x)
gsub(".*: +", "", x))
Upvotes: 2