CurtisBernard
CurtisBernard

Reputation: 11

pulling a column of data with a set number of rows from multiple text files into one text file

I have several hundred text files. I want to extract a specific column with a set number of rows. The files are exactly the same the only thing different is the data values. I want to put that data into a new text file with each new column preceding the previous one.

The file is a .sed basically the same as a .txt file. this is what it looks like. The file actually goes from Wvl 350-2150.

Comment: 
Version: 2.2
File Name: C:\Users\HyLab\Desktop\Curtis 
Bernard\PSR+3500_1596061\PSR+3500_1596061\2019_Feb_16\Contact_00186.sed
<Metadata>
Collected By: 
Sample Name: 
Location: 
Description: 
Environment: 
</Metadata>
Instrument: PSR+3500_SN1596061 [3]
Detectors: 512,256,256
Measurement: REFLECTANCE
Date: 02/16/2019,02/16/2019
Time: 13:07:52.66,13:29:17.00
Temperature (C): 31.29,8.68,-5.71,31.53,8.74,-5.64
Battery Voltage: 7.56,7.20
Averages: 10,10
Integration: 2,2,2,10,8,2
Dark Mode: AUTO,AUTO
Foreoptic: PROBE  {DN}, PROBE  {DN}
Radiometric Calibration: DN
Units: None
Wavelength Range: 350,2500
Latitude: n/a
Longitude: n/a
Altitude: n/a
GPS Time: n/a
Satellites: n/a
Calibrated Reference Correction File: none
Channels: 2151
Columns [5]:
Data:
Chan.#  Wvl Norm. DN (Ref.) Norm. DN (Target)   Reflect. %
0    350.0   1.173460E+002   1.509889E+001      13.7935
1    351.0   1.202493E+002   1.529762E+001      13.6399
2    352.0   1.232869E+002   1.547818E+001      13.4636
3    353.0   1.264006E+002   1.563467E+001      13.2665
4    354.0   1.294906E+002   1.578425E+001      13.0723

I've taken some coding classes but that was a long time ago. I figured this is a pretty straightforward problem for even a novice coder which I am not but I can't seem to find anything like this so was hoping for help on here.

I honestly don't need anything fancy just something like this would be amazing so I don't have to copy and paste each file!

12.3  11.3  etc...
12.3  11.3  etc...
12.3  11.3  etc...
etc.. etc.. etc...

Upvotes: 1

Views: 116

Answers (2)

nekomatic
nekomatic

Reputation: 6284

In Python 3.x with numpy:

import numpy as np
file_list = something # filenames in a Python list
result_array = None
for sed_file in file_list:
    reflectance_column = np.genfromtxt(sed_file, skip_header=35, usecols=4)
    result_array = (reflectance_column if result_array is None else 
                    np.column_stack((result_array, reflectance_column)))
np.savetxt('outputfile.txt', result_array)

Here

  • skip_header=35 ignores the first 35 lines
  • usecols=4 only returns column 5 (Python uses zero-based indexing)
  • see the help for savetxt for further details

Upvotes: 1

nekomatic
nekomatic

Reputation: 6284

In MATLAB R2016b or later, the easiest way to do this would be using readtable:

t = readtable('file.sed', delimitedTextImportOptions( ...
    'NumVariables', 5, 'DataLines', 36, ...
    'Delimiter', ' ', 'ConsecutiveDelimitersRule', 'join'));

where

  • file.sed is the name of the file
  • 'NumVariables', 5 means there are 5 columns of data
  • 'DataLines', 36 means the data starts on the 36th line and continues to the end of the file
  • 'Delimiter', ' ' means the character that separates the columns is a space
  • 'ConsecutiveDelimitersRule', 'join' means treat more than one space as if they were just one (rather than as if they separate empty columns of data).

This assumes that the example file you've posted is in the exact format of your real data. If it's different you may have to modify the parameters above, possibly with reference to the help for delimitedTextImportOptions or as an alternative, fixedWidthImportOptions.

Now you have a MATLAB table t with five columns, of which column 2 is the wavelengths and column 5 is the reflectances - I assume that's the one you want? You can access that column with

t(:,5)

So to collect all the reflectance columns into one table you would do something like

fileList = something % get the list of files from somewhere - say as a string array or a cell array of char
resultTable = table;
for ii = 1:numel(fileList)
    sedFile = fileList{ii};
    t = readtable(sedFile, delimitedTextImportOptions( ...
        'NumVariables', 5, 'DataLines', 36, ...
        'Delimiter', ' ', 'ConsecutiveDelimitersRule', 'join'));
    t.Properties.VariableNames{5} = sprintf('Reflectance%d', ii);  
    resultTable = [resultTable, t(:,5)];
end

The t.Properties.VariableNames ... line is there because column 5 of t will be called Var5 every time, but in the result table each variable name needs to be unique. Here we're renaming the output table variables Reflectance1, Reflectance2 etc but you could change this to whatever you want - perhaps the name of the actual file from sedFile - as long as it's a valid unique variable name.

Finally you can save the result table to a text file using writetable. See the MATLAB help for how to use that.

Upvotes: 1

Related Questions