Reputation: 93
I am new to R programming and have searched SO for many hours. I would appreciate your help.
I have a dataframe, with 3 columns (Date,Description, Debit)
Date Description Debit
2014-01-01 "abcdef VA" 15
2014-01-01 "ghijkl" NY" 56
I am trying to extract the last 2 chars of the second (Description) column (i.e. the 2 letter state abbreviation). I am not very comfortable with apply-type functions.
I have tried using
l <- lapply(a$Description, function(x) {substr(x, nchar(x)-2+1, nchar(x))})
but get the following error message
Error in nchar(x) : invalid multibyte string, element 1
I have tried multiple other approaches, but with the same error.
I am quite sure that I am missing something very basic, so would appreciate your help
thanks
Upvotes: 5
Views: 11353
Reputation: 887971
We can use sub
df$State <- sub(".*\\s+", "", df[,2])
df$State
#[1] "VA" "FL" "GA"
Upvotes: 0
Reputation: 4378
Here's a regex version, using Brandon S's sample data. The regex captures everything after the last whitespace character to the end of the string.
df <- data.frame(date = c("2015-01-01", "2015-02-01", "2015-01-15"),
jumble = c("12345 VA", "123 FL", "12354567732 GA"),
debit = c(15, 36, 20))
df$state <- gsub(".+\\s(.+)$", "\\1", df$jumble)
df
date jumble debit state
1 2015-01-01 12345 VA 15 VA
2 2015-02-01 123 FL 36 FL
3 2015-01-15 12354567732 GA 20 GA
Upvotes: 0
Reputation: 1223
df <- data.frame(date = c("2015-01-01", "2015-02-01", "2015-01-15"),
jumble = c("12345 VA", "123 FL", "12354567732 GA"),
debit = c(15, 36, 20))
df$jumble <- as.character(df$jumble)
df$state <- substr(df$jumble, nchar(df$jumble)-1, nchar(df$jumble))
df
date jumble debit state
1 2015-01-01 12345 VA 15 VA
2 2015-02-01 123 FL 36 FL
3 2015-01-15 12354567732 GA 20 GA
Upvotes: 1