Vibhesh Kaul
Vibhesh Kaul

Reputation: 2613

Read data from text file into cell array

I have multiple text files that contain data in this format

File1.txt

subID    imageCondition  trial   textItem    imageFile   response    RT
Participant003   images  7   Is there a refrigerator?    07_targetPresent-refrigerator.jpg   z   1.436971
Participant003   images  6   Is there an oven mitt?  06_targetPresent-ovenmitt.jpg   z   0.519301
Participant003   images  1   Is there a toaster?     01_targetAbsent-toaster.jpg     m   1.110664
Participant003   images  3   Is there a wine bottle?     03_targetAbsent-winebottle.jpg  m   1.278945
Participant003   images  2   Is there a kettle?  02_targetAbsent-kettle.jpg  z   2.672123
Participant003   images  5   Is there a blender?     05_targetPresent-blender.jpg    m   2.633802
Participant003   images  8   Is there a bucket?  08_targetPresent-bucket.jpg     m   2.596154
Participant003   images  4   Is there a surf board?  04_targetAbsent-surfboard.jpg   m   1.072850

File2.txt

subID    imageCondition  trial   textItem    imageFile   response    RT
Participant005   images  1   Is there a toaster?     01_targetAbsent-toaster.jpg         0.000000
Participant005   images  2   Is there a kettle?  02_targetAbsent-kettle.jpg  m   8.213927
Participant005   images  6   Is there an oven mitt?  06_targetPresent-ovenmitt.jpg   z   3.569293
Participant005   images  4   Is there a surf board?  04_targetAbsent-surfboard.jpg       0.000000
Participant005   images  3   Is there a wine bottle?     03_targetAbsent-winebottle.jpg  m   8.538699
Participant005   images  7   Is there a refrigerator?    07_targetPresent-refrigerator.jpg   z   0.857319
Participant005   images  5   Is there a blender?     05_targetPresent-blender.jpg        0.000000
Participant005   images  8   Is there a bucket?  08_targetPresent-bucket.jpg     z   1.967220

I want to be able to read this data into a cell array so that I can individually access the values that are present in it.

I have the following code that I use to read the data but it's not helping because I am not able to store the data in a way so that I can access the individual values. For example I want all the values from the 'trial' or 'response' column.

function content = load_data(fileName)
fid = fopen(fileName,'r')
if fid > 0
   line_no =1;
   oneline{line_no} = fgetl(fid);
   while ischar(oneline{line_no})
      line_no = line_no +1;
      oneline{line_no} = fgetl(fid);
   endwhile
   fclose(fid)
   content = oneline;
endif
endfunction


for i= 1:size(txtFiles,2)
   data{i} = load_data(txtFiles{1,i});
end

for i=1:1:length(data)
   dataMat = cell2mat(data(i));
   for j=1:1:length(dataMat)
      line = dataMat{1,j};
      % Here I'm only able to fetch lines of data as strings that are separated by more than one space characters, making it more difficult access the required data 
   endfor            
endfor

What I'm looking for is a way to read this data from a text file into a cell array or a matrix so that I can easily access the required values but I am restricted to using the traditional methods of importing data from text file. Or if I could just get help with parsing the data in a way I can access what is required.

Note: There are multiple text files like these. Also It'd be a great help if you can show how to access values in individual columns e.g. 'response' column.

Upvotes: 0

Views: 610

Answers (1)

Mike Scannell
Mike Scannell

Reputation: 378

This would be easy to do with something like strsplit to split the data based on spaces; except your textItem field has spaces in it. So I would suggest using regular expressions. Using named tokens is a convenient way to organize the results when you're looking for several separate pieces at a time. I realize that if you're not familiar with regular expressions, it's a tough thing to jump into. Check out regex101.com for information and a very useful online tool for testing your regular expression. See this specific example on regex101. That said, here's my answer which works on your data:

text = fileread(filename);
data = regexp(data,'^(?<subID>\w+)\s+(?<imageCondition>\w+)\s+(?<trial>\d+)\s+(?<textItem>.*?\?)\s+(?<imageFile>[-\.\w]+)\s+(?<response>\w)\s+(?<RT>[\d\.]+)','names','lineanchors')

Or you could turn it into a table:

dataTable = struct2table(data)

Result looks like:

      subID           imageCondition    trial              textItem                            imageFile                  response         RT     
__________________    ______________    _____    ____________________________    _____________________________________    ________    ____________

{'Participant003'}      {'images'}      {'7'}    {'Is there a refrigerator?'}    {'07_targetPresent-refrigerator.jpg'}     {'z'}      {'1.436971'}
{'Participant003'}      {'images'}      {'6'}    {'Is there an oven mitt?'  }    {'06_targetPresent-ovenmitt.jpg'    }     {'z'}      {'0.519301'}
{'Participant003'}      {'images'}      {'1'}    {'Is there a toaster?'     }    {'01_targetAbsent-toaster.jpg'      }     {'m'}      {'1.110664'}
{'Participant003'}      {'images'}      {'3'}    {'Is there a wine bottle?' }    {'03_targetAbsent-winebottle.jpg'   }     {'m'}      {'1.278945'}
{'Participant003'}      {'images'}      {'2'}    {'Is there a kettle?'      }    {'02_targetAbsent-kettle.jpg'       }     {'z'}      {'2.672123'}
{'Participant003'}      {'images'}      {'5'}    {'Is there a blender?'     }    {'05_targetPresent-blender.jpg'     }     {'m'}      {'2.633802'}
{'Participant003'}      {'images'}      {'8'}    {'Is there a bucket?'      }    {'08_targetPresent-bucket.jpg'      }     {'m'}      {'2.596154'}
{'Participant003'}      {'images'}      {'4'}    {'Is there a surf board?'  }    {'04_targetAbsent-surfboard.jpg'    }     {'m'}      {'1.072850'}

If you want to turn the numeric fields into numbers:

dataTable.trial = str2double(dataTable.trial);
dataTable.RT = str2double(dataTable.RT);

Which then gives:

      subID           imageCondition    trial              textItem                            imageFile                  response      RT  
__________________    ______________    _____    ____________________________    _____________________________________    ________    ______

{'Participant003'}      {'images'}        7      {'Is there a refrigerator?'}    {'07_targetPresent-refrigerator.jpg'}     {'z'}       1.437
{'Participant003'}      {'images'}        6      {'Is there an oven mitt?'  }    {'06_targetPresent-ovenmitt.jpg'    }     {'z'}      0.5193
{'Participant003'}      {'images'}        1      {'Is there a toaster?'     }    {'01_targetAbsent-toaster.jpg'      }     {'m'}      1.1107
{'Participant003'}      {'images'}        3      {'Is there a wine bottle?' }    {'03_targetAbsent-winebottle.jpg'   }     {'m'}      1.2789
{'Participant003'}      {'images'}        2      {'Is there a kettle?'      }    {'02_targetAbsent-kettle.jpg'       }     {'z'}      2.6721
{'Participant003'}      {'images'}        5      {'Is there a blender?'     }    {'05_targetPresent-blender.jpg'     }     {'m'}      2.6338
{'Participant003'}      {'images'}        8      {'Is there a bucket?'      }    {'08_targetPresent-bucket.jpg'      }     {'m'}      2.5962
{'Participant003'}      {'images'}        4      {'Is there a surf board?'  }    {'04_targetAbsent-surfboard.jpg'    }     {'m'}      1.0729

You also asked how to access it. Get the third "response" from the table:

dataTable.response{3}

Or from the structure:

data(3).response

Upvotes: 1

Related Questions