Reputation: 1

How can I extract sentences with certain text in a spreadsheet?

I got a spreadsheet which looks like this. I will like to keep the file column, but extract only the sentences with the word "India". Is there a way to do that? Prefer to use KNIME or R, but happy with any solution.

Only the sentences with "India" is extracted, but the file column is kept.

Upvotes: -3

Answers (2)

akrun

Reputation: 887911

We can use base R with grepl

subset(df, grepl("India", Text, ignore.case = TRUE))

Upvotes: 1

L Tyrone

Reputation: 7205

This can be achieved using the dplyr and str_detect() from the stringr package. Note that "India | india" in the following code will capture both "India" and the grammatically incorrect "india" in case it exists:

library(dplyr)
library(stringr)

# Some example data
df <- data.frame(File = c(1356, 1548, 1600, 1601),
                 Text = c("Digital India is an initiative by the Government of India to ensure that Government services are made available to citizens electronically by improving online infrastructure and by i",
                          "The textile industry in India traditionally, after agriculture, is the only industry that has generated huge employment for both skilled and unskilled labour. The textile industry conti",
                          "Some other text",
                          "This string has india without a capital I."))

df <- df %>%
  filter(str_detect(Text, "India | india"))

df
#   File   Text
# 1 1356   Digital India is an initiative by the Government of India to ensure that Government services are made available to citizens electronically by improving online infrastructure and by i
# 2 1548   The textile industry in India traditionally, after agriculture, is the only industry that has generated huge employment for both skilled and unskilled labour. The textile industry conti
# 3 1601   This string has india without a capital I.

Upvotes: 0

How can I extract sentences with certain text in a spreadsheet?

Answers (2)

Related Questions