Reputation: 1613
I have the following dataframe:
df = data.frame(Text = c("This is great. A really great place to be. For sure if you wanna solve R issues. Skilled people.", "Good morning. There are very skilled programmers here. They will help sorting this. I am sure.", "SO is great. You can get many things solve. Additional paragraph."), stringsAsFactors = F)
I have used to subset the text into sentences:
library(textshape)
split_sentence(df$Text)
However, I would like to subset the "Text" column every 2 senteces, so to get a list like:
This is great.
A really great place to be.
Good morning.
There are very skilled programmers here.
SO is great.
You can get many things solve.
Can anyone help me?
Thanks!
Upvotes: 0
Views: 200
Reputation: 5788
Base R solution, note this solution allows n to be set as any integer and follows that in a retain / skip pattern.
# Number of sentences to keep before removing the same number of sentences: n => integer scalar
n <- 2
# Split the string into separate sentences: sentences => list of a character vector
res <- subset(data.frame(sentences = unlist(strsplit(paste0(df$Text, collapse = " "), "(?<=\\.)\\s+", perl = TRUE))),
ceiling(seq_along(sentences) / n) %% 2 == 1)[ , 1, drop = TRUE]
# Print the result to console: character vector => stdout (console)
res
# Data:
df = data.frame(Text = c("This is great. A really great place to be. For sure if you wanna solve R issues. Skilled people.", "Good morning. There are very skilled programmers here. They will help sorting this. I am sure.", "SO is great. You can get many things solve. Additional paragraph."), stringsAsFactors = F)
Upvotes: 1
Reputation: 17299
Another option with strsplit
and head
:
unlist(lapply(strsplit(df$Text, '(?<=\\.)\\s*', perl = TRUE), head, 2))
# [1] "This is great." "A really great place to be."
# [3] "Good morning." "There are very skilled programmers here."
# [5] "SO is great." "You can get many things solve."
Upvotes: 3
Reputation: 389055
You could split Text
into separate rows for every sentence and select only 1st 2 sentences in each row. Using dplyr
you can do this as :
library(dplyr)
df %>%
mutate(row = row_number()) %>%
tidyr::separate_rows(Text, sep = '\\.\\s*') %>%
group_by(row) %>%
slice(1:2) %>%
ungroup %>%
select(-row)
# Text
# <chr>
#1 This is great
#2 A really great place to be
#3 Good morning
#4 There are very skilled programmers here
#5 SO is great
#6 You can get many things solve
Upvotes: 2