TheGoat
TheGoat

Reputation: 2877

Create repeated range of values by adding fixed values each time

I have imported a PDF into R and I need to read certain rows in this large PDF. The PDF file has been imported using pdftools and the object is of class character with 1:10353 rows.

nrow(PDF)
NULL
class(PDF)
[1] "character"
str(PDF)
 chr [1:10353] "Itemized Statement For:" "Patient Name:             SMITH ,JOHN" "POLICY ID:                000000000" ...

I need to read in the following lines PDF.clean <-PDF[c(7:38,47:78,87:118............)]

From above, the lines start between 7:38 and then repeat by adding 40 to these initial values until the end of the document is reached.

Is there a smart way that I can set initial seeds such as x = 7 and y = 38 and then add 40 to each last value until such time as the values don't exceed 10353 and build up a subset clause this way?

Upvotes: 0

Views: 39

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389335

You can create a sequence from 0 to end value with a step of 40 and add it to 7:38 to get all the indices that you want to extract. Remove those indices which are greater than end.

end <- 10353
inds <- c(sapply(seq(0, end, 40), `+`, 7:38))
inds <- inds[inds <= end]

head(inds, 35)
# [1]  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
#[25] 31 32 33 34 35 36 37 38 47 48 49

tail(inds, 35)
# [1] 10311 10312 10313 10314 10315 10316 10317 10318 10327 10328 10329 10330
#[13] 10331 10332 10333 10334 10335 10336 10337 10338 10339 10340 10341 10342
#[25] 10343 10344 10345 10346 10347 10348 10349 10350 10351 10352 10353

You can use this to subset data from PDF.

PDF.clean <- PDF[inds, ]

Upvotes: 1

Related Questions