Momchill
Momchill

Reputation: 466

remove last n elements from a row in a dataframe

I want to remove the last n elements of each row in a dataframe. The dataframe I'm working on (head_col) has one column and I wish to split this column into more columns - one holding the last element of the original, another holding the last two, and one holding the last three.

Through searching around I found very helpful topics such as this and other related ones, but I am such a regex toy that I can't manage to rewrite the snipped so that it take the last two/three elements. I also tried to play around with packages like stringi and its stri_extract_last_words, but this also takes just the last word. Any pointers on if/how to use this very handy stringi function to get it to what I want would be very appreciated.

link to the source .xls file - https://www.dropbox.com/s/c1ftjwine8ekj65/Book2_1.xls?dl=0

library(data.table)
library(XLConnect)
library(stringr)
library(stringi)

#load .xls
wb <- loadWorkbook('D:/MOMUT1/GIS_Workload/Other/alex/Book2_1.xls')
df <- readWorksheet(wb, 1, header = TRUE)

#remove NAs
df_final <- subset(df, !is.na(df$HEADLINE))

#take out HEADLINE column to work on
head_col <- data.table(df_final$HEADLINE)

#regex attempts
head_col_last_1 <- sub(".*\\s+", '', head_col$V1) # takes only last word
head_col_last_2 <- gsub(".*\\s+(.*)", "\\1", head_col$V1) #also takes only last word

#stringi attempt
head_col_last_1 <- data.frame(stri_extract_last_words(head_col$V1))

Upvotes: 1

Views: 434

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

You may use

head_col_last_1 <- str_extract(head_col$V1, "\\S+(?:\\s+\\S+){1,2}(?=\\s*$)")

The pattern matches:

  • \\S+ - 1+ non-whitespace chars
  • (?:\\s+\\S+){1,2} - one or two occurrences of
    • \\s+ - 1+ whitespace chars
    • \\S+ - 1+ non-whitespace chars
  • (?=\\s*$) - that are followed with 0+ whitespaces and the end of string.

Upvotes: 1

Related Questions