J.Doe
J.Doe

Reputation: 139

How to extract sentences containing specific words or characters from a text file using R

I'm new to R and am having a few issues.

I have a text file that contains section numbers, such as:

1.0
1.1
1.1.1
1.2
etc.

I have another text file with sentences that contain these numbers at the beginning of the sentence such as:

1.0       General
Random sentence.
1.1       Description
Random sentence.
1.1.1     Background
Random sentence.

I only want to extract the lines containing the section numbers,so basically:

1.0 General
1.1 Description
1.1.1 Background

Upvotes: 1

Views: 403

Answers (1)

akrun
akrun

Reputation: 887223

We can use grep to check whether the second text starts with numbers after reading the file with readLines

grep("^[0-9.]+", txt2, value = TRUE)

If there are also other numbers that are the beginning of a sentence, then read the first file as well and either use grep or %in% after extracting the substring

out <- txt2[sub("\\s+.*", "", txt2) %in% txt1]
cat(out, sep="\n")
#1.0       General
#1.1       Description
#1.1.1     Background

data

txt1 <- readLines("file1.txt")
txt2 <- readLines("file2.txt")

Upvotes: 3

Related Questions