DJ-AFC
DJ-AFC

Reputation: 569

Extract comments from text file and append to dataset

I have a dataset that takes the form:

# prate [mm/day] from 4x Daily NOAA-CIRES 20th Century Reanalysis V2c
# grid point lon,lat =   22.500   63.808
 1851 1.8 0.9 1.7 1.5 1.6 2.7 2.7 2.6 1.3 2.5 1.8 1.7
 1852 2.2 1.6 0.9 1.4 1.6 2.5 2.4 2.0 1.8 2.3 1.9 1.5
...

I wish to extract the longitude and latitude from the comments at the top of this text file and add them repeatedly as two additional columns in this dataset. My output should therefore look like so:

# prate [mm/day] from 4x Daily NOAA-CIRES 20th Century Reanalysis V2c
# grid point lon,lat =   22.500   63.808
 1851 1.8 0.9 1.7 1.5 1.6 2.7 2.7 2.6 1.3 2.5 1.8 1.7 22.500 63.808
 1852 2.2 1.6 0.9 1.4 1.6 2.5 2.4 2.0 1.8 2.3 1.9 1.5 22.500 63.808
...

Does anyone have any ideas on how to achieve this?

Upvotes: 1

Views: 66

Answers (2)

Jaap
Jaap

Reputation: 83215

Using:

dat <- read.table('dataset.txt', header = FALSE, skip = 2)
txt <- readLines('dataset.txt', n = 2)

llcols <- read.table(text = trimws(gsub('.*=','',txt[2])), header = FALSE)
names(llcols) <- c('lon','lat')

dat <- cbind(dat, llcols)

gives:

> dat
    V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13  lon    lat
1 1851 1.8 0.9 1.7 1.5 1.6 2.7 2.7 2.6 1.3 2.5 1.8 1.7 22.5 63.808
2 1852 2.2 1.6 0.9 1.4 1.6 2.5 2.4 2.0 1.8 2.3 1.9 1.5 22.5 63.808

Explanation:

  • With dat <- read.table('dataset.txt', header = FALSE, skip = 2) you read the data and exclude the two comment lines.
  • With txt <- readLines('dataset.txt', n = 2) you read the two comment lines as text.
  • With trimws(gsub('.*=','',txt[2])) you extract the lon/lat values.
  • And read the result with read.table in a new dataframe.
  • With cbind you combine the two dataframes into one. The lon/lat values will be repeated till the end of dat.

Reading a bunch of files can be done as follows:

filenames <- list.files(pattern = '.txt')

dflist <- lapply(filenames, function(x) {
  dat <- read.table(x, header = FALSE, skip = 2)
  txt <- readLines(x, n = 2)

  llcols <- read.table(text = trimws(gsub('.*=','',txt[2])), header = FALSE)
  names(llcols) <- c('lon','lat')

  cbind(dat,llcols)
})

Upvotes: 2

DJ-AFC
DJ-AFC

Reputation: 569

I have now found the solution for applying this to lists as follows:

dat=apply(data.frame(list.files()), 1, read.table, header=F, skip=2)
txt=apply(data.frame(list.files()), 1, readLines, n=2)
llcols=lapply(txt, function(x) read.table(text =trimws(gsub('.*=','',txt[2])), header = FALSE))
names(LLCOLS)=c('lon','lat')
dat=lapply(dat, function(x) cbind(x, llcols))

Upvotes: 1

Related Questions