jj2593
jj2593

Reputation: 77

Need to read specific lines of text file that doesn't have a great format into data frame (preferably multiple data frames)?

I'm completely new to R and I'm not sure the best way to deal with this file so I'm really hoping someone can at least point me in the right direction. I've searched for other solutions and tried using grepl but can't seem to figure out the best way to only read some of the data. The file I'm trying to read in looks something like the text below:

##BLOCKS= 8          
Plate:  Plate01 1.3 PlateFormat Endpoint    Absorbance  Raw FALSE   1               1   630 1   12  96  1   8   None    
Temperature(°C) 1       2       3       4       5       6       7       8       9       10      11      12      
0.00            0.042   0.067   0.292   0.206   0.071   0.067   0.04    0.063   0.059   0.04    0.066   0.04        
                0.043   0.172   0.179   0.199   0.073   0.067   0.04    0.062   0.058   0.039   0.066   0.039       
                 0.04   0.066   0.29    0.185   0.072   0.067   0.04    0.062   0.058   0.039   0.065   0.039       
                0.039   0.068   0.291   0.189   0.075   0.069   0.04    0.064   0.058   0.041   0.064   0.039       
                0.042   0.063   0.271   0.191   0.07    0.068   0.04    0.065   0.058   0.041   0.066   0.04        
                0.041   0.067   0.342   0.199   0.069   0.066   0.041   0.065   0.057   0.04    0.065   0.042       
                0.044   0.064   0.295   0.198   0.069   0.067   0.039   0.064   0.057   0.04    0.067   0.041       
                0.041   0.067   0.29    0.211   0.066   0.067   0.043   0.056   0.058   0.042   0.067   0.042       

~End
Plate:  Plate#1 1.3 PlateFormat Endpoint    Absorbance  Raw FALSE   1                       1   630 1   12  96  1   8   None    
Temperature(°C) 1       2       3       4       5       6       7       8       9       10      11      12      
0.00            0.042   0.072   0.257   0.165   0.074   0.07    0.04    0.067   0.055   0.04    0.07    0.04        
                0.042   0.164   0.136   0.195   0.075   0.07    0.041   0.066   0.055   0.04    0.069   0.04        
                0.041   0.07    0.344   0.198   0.074   0.069   0.041   0.065   0.055   0.04    0.068   0.04        
                0.04    0.069   0.307   0.199   0.075   0.072   0.041   0.067   0.055   0.043   0.068   0.041       
                0.043   0.068   0.296   0.214   0.072   0.071   0.042   0.067   0.055   0.041   0.068   0.041       
                0.041   0.071   0.452   0.241   0.072   0.069   0.042   0.067   0.054   0.041   0.068   0.043       
                0.044   0.068   0.299   0.182   0.071   0.071   0.042   0.067   0.054   0.041   0.069   0.041       
                0.042   0.071   0.333   0.13    0.068   0.07    0.042   0.058   0.054   0.042   0.07    0.041       

~End

I only want the columns/rows numbered 1-12 (next to Temperature) and the data under them. I'm new to R but do have some programming experience so I don't necessarily need someone to tell me exactly how to do this but if anyone could at least point me in the right direction of whatever functions I should be looking at I'd really appreciate the help!

Upvotes: 1

Views: 34

Answers (1)

IRTFM
IRTFM

Reputation: 263451

Step 1: Get the data into R session with readLines

Lines <- readLines(textConnection("##BLOCKS= 8          
Plate:  Plate01 1.3 PlateFormat Endpoint    Absorbance  Raw FALSE   1               1   630 1   12  96  1   8   None    
Temperature(°C) 1       2       3       4       5       6       7       8       9       10      11      12      
0.00            0.042   0.067   0.292   0.206   0.071   0.067   0.04    0.063   0.059   0.04    0.066   0.04        
                0.043   0.172   0.179   0.199   0.073   0.067   0.04    0.062   0.058   0.039   0.066   0.039       
                 0.04   0.066   0.29    0.185   0.072   0.067   0.04    0.062   0.058   0.039   0.065   0.039       
                0.039   0.068   0.291   0.189   0.075   0.069   0.04    0.064   0.058   0.041   0.064   0.039       
                0.042   0.063   0.271   0.191   0.07    0.068   0.04    0.065   0.058   0.041   0.066   0.04        
                0.041   0.067   0.342   0.199   0.069   0.066   0.041   0.065   0.057   0.04    0.065   0.042       
                0.044   0.064   0.295   0.198   0.069   0.067   0.039   0.064   0.057   0.04    0.067   0.041       
                0.041   0.067   0.29    0.211   0.066   0.067   0.043   0.056   0.058   0.042   0.067   0.042       

~End
Plate:  Plate#1 1.3 PlateFormat Endpoint    Absorbance  Raw FALSE   1                       1   630 1   12  96  1   8   None    
Temperature(°C) 1       2       3       4       5       6       7       8       9       10      11      12      
0.00            0.042   0.072   0.257   0.165   0.074   0.07    0.04    0.067   0.055   0.04    0.07    0.04        
                0.042   0.164   0.136   0.195   0.075   0.07    0.041   0.066   0.055   0.04    0.069   0.04        
                0.041   0.07    0.344   0.198   0.074   0.069   0.041   0.065   0.055   0.04    0.068   0.04        
                0.04    0.069   0.307   0.199   0.075   0.072   0.041   0.067   0.055   0.043   0.068   0.041       
                0.043   0.068   0.296   0.214   0.072   0.071   0.042   0.067   0.055   0.041   0.068   0.041       
                0.041   0.071   0.452   0.241   0.072   0.069   0.042   0.067   0.054   0.041   0.068   0.043       
                0.044   0.068   0.299   0.182   0.071   0.071   0.042   0.067   0.054   0.041   0.069   0.041       
                0.042   0.071   0.333   0.13    0.068   0.07    0.042   0.058   0.054   0.042   0.07    0.041       

~End"))

Steps 2 & 3: Build a conditional to include good data lines, and grouping

?strsplit 
# Couldn't remember name of `substr`, figured the ?strsplit  page would show link

start <- substr(Lines, 1,1)  # 1st char was sufficient to build a rule
table(start)
#--- result ----
start
       #  ~  0  P  T      # the 14 is the count of " " (just spaces)
 2 14  1  2  2  2  2 
#end table
goodL <- Lines[start %in% c(" ","T","0")  ]
goodL  # Look at result

group <- cumsum(substr(goodL , 1,4)=="Temp")  #build grouping
group   # check the grouping variable
 [1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2

Step 4: Process the groups with lapply(split(goodL, group), function(x) ...

dfrms <- lapply(split(goodL, group), 
             function(x) read.table(text=substr(x,16,  # stuff to right of 16th char
                                                     100),header=TRUE))
str(dfrms)  # check result,,, not correct, need 12th entry
List of 2
 $ 1:'data.frame':  8 obs. of  11 variables:
  ..$ X1 : num [1:8] 0.042 0.043 0.04 0.039 0.042 0.041 0.044 0.041
  ..$ X2 : num [1:8] 0.067 0.172 0.066 0.068 0.063 0.067 0.064 0.067
 #   -----snipped output

 dfrms <- lapply(split(goodL, group),   # will be a list of dataframes
            function(x) read.table(text =substr(x, 16, 120), header=TRUE))
 str(dfrms)   # Looks good
List of 2
 $ 1:'data.frame':  8 obs. of  12 variables:
  ..$ X1 : num [1:8] 0.042 0.043 0.04 0.039 0.042 0.041 0.044 0.041
  ..$ X2 : num [1:8] 0.067 0.172 0.066 0.068 0.063 0.067 0.064 0.067
  ..$ X3 : num [1:8] 0.292 0.179 0.29 0.291 0.271 0.342 0.295 0.29
 #--- snippped output

I'd like to give credit to @G.Grothendieck for this strategy. Doing a search on "user:516548 readLines" will pull up many other elegant examples of a similar approach.

Upvotes: 3

Related Questions