How to extract a specific string followed by any number?

Question

I have a small problem. I have text in this format:

A.1 Goals

Section 1: Blah Blah Blah
Random sentence A. Random sentence.
Section 2: Blah Blah Blah
Random sentence A.
Random sentence.

A.2 description

I want to obtain output of:

A.1 Goals

Section 1: Blah Blah Blah

Section 2: Blah Blah Blah

A.2 description

So basically how to obtain any string that is repeated more than once and followed by any possible number (any pattern of the same string and varying numbers)

akrun · Accepted Answer

We can use grep after reading with readLines. Here, we match either the letter ("A" followed by a . followed by one or more numbers - \d+) or (|) if the text starts with "Section" (^Section) followed by some characters (.*) and if there is a repeated word followed spaces ((\w+\s*)\1 - \1 is the backreference for the captured group)

out <- grep("(^A\.\d+)|(^Section.*\b(\w+\s*)\1)", lines, value = TRUE)
cat(out, sep= "

")
#A.1 Goals

#Section 1: Blah Blah Blah

#Section 2: Blah Blah Blah

#A.2 description

data

lines <- readLines("file.txt") #reading from the file

How to extract a specific string followed by any number?

Answers (2)

data

Related Questions