chainhomelow
chainhomelow

Reputation: 347

Using Matlab regexp on a cell array to return specific filenames

I have a folder in which there are many files and I want to create a matrix that holds filenames with a specific pattern. For example: The folder contains files with names starting with a subject number (e.g. 03T1A.xxx.nii, 03T1A.yyy.nii) as well as filenames with specific patterns in the middle (e.g. 03T1A.c100.nii, 03T1A.c200.nii, 03T1A.c300.nii). In this specific case I am looking to extract all the filenames with the pattern c1 and c2 in the middle (e.g. 03T1A.c100.nii and 03T1A.c200.nii but not 03T1A.c300.nii).

To this point I have used the following code to create a pattern matching variable in 'pattern' which I would like to apply to the cell array of filenames I have extracted into the variable 'all_files' via the dir call.

func_path = char(strcat(input_dir, '/', subs(files), '/Func'));
pattern = 'c[12]*.nii'
all_files = dir(func_path); 
all_files = {all_files.name};

I'd like to use (read. practice) regexp and doing it with string input seems easy but I am 100% stumped as to how to do it with cell input. I started trying to do something like this:

files = all_files(cellfun(@(x)regexp(x, pattern));

But it doesn't work, obviously. Could someone help me figure out what to do here if my ultimate goal is to get a matrix output with just the relevant filenames? I've been searching MATLAB answers and other Stack Overflow posts but part of my problem is I don't understand what's happening in their code snippets. I took the above line (or at the least the beginning of it) from another post but I don't know what, for example, 'x' is (an output variable?) or what's going on in the larger command such as

fin = cellfun(@(x)regexp(x, '\.', 'split'), res, 'UniformOutput', false)

Which I found in another thread. So basically, can someone help me figure out a command that will work while explaining it to me?

Upvotes: 1

Views: 985

Answers (1)

Suever
Suever

Reputation: 65430

A couple of recommendations for doing this sort of thing

  1. Do not use strcat and '/' characters to construct file paths. strcat trims whitespace from all inputs prior to concatenation (filenames may have actual leading or trailing whitespace) and also rather than hard-coding a file path separator such as '/' , use filesep or better yet use fullfile to construct the path to ensure that it will work on various platforms without problems.

    func_path = fullfile(input_dir, subs(files), 'Func');
    
  2. regexp works directly on cell arrays therefore you can simply do:

    all_files = dir(func_path); 
    
    % Search for the pattern in all filenames
    matches = regexp({all_files.name}, pattern);
    
    % Get the filenames of those that matched
    all_files = {all_files(~cellfun('isempty', matches)).name};
    
  3. Your pattern isn't matching any files because it currently would match only strings that contain a "c" with only zero or more 1's or 2's before the file extension. Instead, you'll want to use .* to match anything between the "c1" or "c2" and the filename. Also you'll want to not use a * after [12] since that will actually match c3 since that has zero 1's or 2's. Also you'll want to escape the . in .nii so that it's not treated like a wildcard. For your pattern I would use something like

    pattern = 'c[12].*\.nii';
    
  4. If you really don't want to work with regular expressions, you could avoid all of this by simply using wildcards in your dir call

    c1_files = dir(fullfile(func_path, '*c1*.nii'));
    c2_files = dir(fullfile(func_path, '*c2*.nii'));
    

Upvotes: 2

Related Questions