Reputation: 11
I have about 50 files full of lines of data. What I'm trying to do is have a program that can open the files, read the line 8 of the files then sort the files according to line 8. Line 8 represents the longitude.
So the files of the lowest longitude comes first. I'm trying in vain with the fget
function and I think it is impossible. I posted this before but I deleted it because i made a mistake in tagging.
An example of what a file looks like is shown below:
1 Cruise_Number: 2006002
2 Cruise_Name: ARCTICNET 0602
3 Original_Filename: CTD_2006002_016_1_DN.ODF
4 Station : Station BA04
5 Cast_Number : 016
6 Start_Date_Time [UTC]: 07-SEP-2006 02:05:00.00
7 Initial_Latitude [deg]: 75.277
8 Initial_Longitude [deg]: -74.9482
9 Sounding [m]: 489
10 Min_Depth [m]: 7.27
11 Max_Depth [m]: 462.57
Upvotes: 1
Views: 493
Reputation: 104515
First things first, you should use dir
and fullfile
to read all of the files in a folder / directory. The output of dir
will be a structure of information that represents each file in your directory. The reason why you use fullfile
is because the path separator between folders / directories is different on every operating system. On Linux/Mac, it's / while on Windows it's \. In order to be operating system agnostic, let fullfile
build the path of the directory for you.
After, use a for
loop to iterate over each file, open it up, do your processing and close the file. I'd use fgetl
when you open up a file, and call it 8 times to get to the 8th line. fgetl
will read in the line as a string. Next, I would look for a number at the end of the line using regular expressions, convert this number represented as a string to an actual number, then place this into an array.
In the end, you'll have 50 numbers, so sort these numbers and get the corresponding sorted indices, then reorder what is output from dir
.
I'm going to assume that all of your files are placed inside a single directory. Also, assuming that all of your files have the extension of .txt
, do something like this:
folder = fullfile('path', 'to', 'folder'); %// Replace where your files are here
files = dir(fullfile(folder, '*.txt')); %// Find all files in folder
longitude = zeros(numel(files), 1); %// Initialize longitude array
for idx = 1 : numel(files) %// For each file
fileID = fopen(fullfile(folder, files(idx).name)); %// Open up file
for idx2 = 1 : 8
str = fgetl(fileID); %// Skip to the 8th line
end
%// Extract out the number at the end of the string
numCell = regexp(str, '-?\d+\.\d+$', 'match');
%// Place into array
longitude(idx) = str2double(numCell{1});
%// Close file
fclose(fileID);
end
%// Sort the longitudes and get the index ordering
[~,ind] = sort(longitude);
%// Also reorder files in structure
files = files(ind);
Let's step through this code line by line. The first line you specify where your files are located. Each subfolder is separated as a single string, so keep that in mind when modifying the first line of code. The second line finds a list of all files that have a .txt
extension in this said directory. Keep in mind that the filenames are with respect to the input directory of dir
so if you still want to access the file, you'll need to use fullfile
again. Next, we create an array to store our longitudes that you read in from each file.
Next, we loop through every text file in this directory, you open up the file with fopen
, then use fgetl
8 times and skip 7 lines. The 8th line contains your text of interest. With this line, we use regular expressions to extract out the number at the end of the string. Regular expressions are mechanisms used to find patterns in text. In your case, you want to find a specific pattern - namely a floating point number that possibly has a negative value in the front. We use the function regexp
to help us search for a pattern. The first input is a string and the second input is the pattern we want to choose. The pattern we are looking for is rather cryptic at first sight:
-?\d+\.\d+$
This says that we are looking for a number where it may optionally have a negative sign (-?)
followed by a sequence of one or more digits (\d+
), followed by a decimal point (\.
), followed by another sequence of digits (\d+
), and we make sure that this happens at the end of the string ($
). We use the flag 'match'
to return the actual matched strings. Not doing this would return the locations of where the strings were found, and that's not what we want. What is returned is a cell array of strings, and if this is done correctly, there should only be one element in this returned cell array. We simply access that element, convert this to an actual number via str2double
, then log this into our longitude array.
When we're done looping, we sort these longitudes and extract the index ordering. We then use this index ordering to reorder the names of the files from the structure that was returned from dir
to complete the code. f
will contain the filenames in sorted order with reference to longitude
. I'm not sure what you'd want to do with this after, so I'll leave this answer as it is.
However, if you want to see a list of all of the files in sorted order, you can very simply do:
filesSorted = char(files.name);
This will unpack all of the names of the filenames and place them into a 2D character array where each row is a filename. The first row is the filename with the smallest longitude, the second row is the filename for the second smallest longitude and so on.
You can also place all of the names in a cell array by:
filesSorted = {files.name};
Then to access a particular filename, do:
file = filesSorted{idx};
idx
is a number from 1 up to as many files as you have.
Good luck!
Upvotes: 3