Sher
Sher

Reputation: 415

Extracting numbers from text file in R

I have multiple text files which I want to read and extract a number which is in the row containing "never classified (0)" and a file name as a dataframe.

files <- list.files(path= "directory/info/", pattern= "*.txt", full.names = TRUE)

data <- lapply(files, function(x) {

  datxt <- read.table(x, sep = "\t", header = TRUE, stringsAsFactors = FALSE)


  for (i in 1:length(datxt)){
    i = gsub("\\never classified (0)", "", i)
    }

 return(data.frame(file=x,NoOfReturn=i))
})

Sample text looks like this:

LASzip compression (version 3.4r1 c2 50000): POINT10 2
reporting minimum and maximum for all LAS point record entries ...
  X                   0        527
  Y                   0       2009
  Z                   0        241
  intensity           1        314
  return_number       1          1
  number_of_returns   1          1
  edge_of_flight_line 0          0
  scan_direction_flag 0          0
  classification      0          0
  scan_angle_rank     0          0
  user_data           0          0
  point_source_ID     0          0
number of first returns:        2781080
number of intermediate returns: 0
number of last returns:         2781080
number of single returns:       2781080
overview over number of returns of given pulse: 2781080 0 0 0 0 0 0
histogram of classification of points:
         2781080  never classified (0)

And I want to return a file name and 2781080 as a dataframe.

Upvotes: 1

Views: 1274

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145755

This should work:

files <- list.files(path= "directory/info/", pattern= "*.txt", full.names = TRUE)
data <- lapply(files, function(x) {
  # the data we're interested in doesn't seem to be a table 
  # easier to read it in as a character vector
  datxt <- readLines(x)

  # keep only the line with the text we're looking for
  datxt <- datxt[grepl(pattern = "never classified (0)", x = datxt, fixed = TRUE)]

  # get the number from that line
  n <- sub(pattern = "never classified (0)", replacement = "", x = datxt, fixed = TRUE)
  n <- as.numeric(trimws(n))

  return(data.frame(file = x, NoOfReturn = n))
})

Upvotes: 1

Related Questions