Reputation: 466
I want to remove the last n elements of each row in a dataframe. The dataframe I'm working on (head_col
) has one column and I wish to split this column into more columns - one holding the last element of the original, another holding the last two, and one holding the last three.
Through searching around I found very helpful topics such as this and other related ones, but I am such a regex toy that I can't manage to rewrite the snipped so that it take the last two/three elements. I also tried to play around with packages like stringi
and its stri_extract_last_words
, but this also takes just the last word. Any pointers on if/how to use this very handy stringi
function to get it to what I want would be very appreciated.
link to the source .xls file - https://www.dropbox.com/s/c1ftjwine8ekj65/Book2_1.xls?dl=0
library(data.table)
library(XLConnect)
library(stringr)
library(stringi)
#load .xls
wb <- loadWorkbook('D:/MOMUT1/GIS_Workload/Other/alex/Book2_1.xls')
df <- readWorksheet(wb, 1, header = TRUE)
#remove NAs
df_final <- subset(df, !is.na(df$HEADLINE))
#take out HEADLINE column to work on
head_col <- data.table(df_final$HEADLINE)
#regex attempts
head_col_last_1 <- sub(".*\\s+", '', head_col$V1) # takes only last word
head_col_last_2 <- gsub(".*\\s+(.*)", "\\1", head_col$V1) #also takes only last word
#stringi attempt
head_col_last_1 <- data.frame(stri_extract_last_words(head_col$V1))
Upvotes: 1
Views: 434
Reputation: 626747
You may use
head_col_last_1 <- str_extract(head_col$V1, "\\S+(?:\\s+\\S+){1,2}(?=\\s*$)")
The pattern matches:
\\S+
- 1+ non-whitespace chars(?:\\s+\\S+){1,2}
- one or two occurrences of
\\s+
- 1+ whitespace chars\\S+
- 1+ non-whitespace chars(?=\\s*$)
- that are followed with 0+ whitespaces and the end of string.Upvotes: 1