Reputation: 41
I have high-dimensional data, for brain signals, that I would like to explore using R.
Since I am a data scientist I really do not work with Matlab, but R and Python. Unfortunately, the team I am working with is using Matlab to record the signals. Therefore, I have several questions for those of you who are interested in data science.
The Matlab files, recorded data, are single objects with the following dimensions: 1000*32*6000
1000: denotes the sampling rate of the signal.
32: denotes the number of channels.
6000: denotes the time in seconds, so that is 1 hour and 40 minutes long.
The questions/challenges I am facing:
I converted the "mat" files I have into CSV files, so I can use them in R. However, CSV files are 2 dimensional files with the dimensions: 1000*192000.
the CSV files are rather large, about 1.3 gigabytes. Is there a better way to convert "mat" files into something compatible with R, and smaller in size? I have tried "R.matlab" with readMat, but it is not compatible with the 7th version of Matlab; so I tried to save as V6 version, but it says "Error: cannot allocate vector of size 5.7 Gb"
the time it takes to read the CSV file is rather long! It takes about 9 minutes to load the data. That is using "fread" since the base R function read.csv takes forever. Is there a better way to read files faster?
Once I read the data into R, it is 1000*192000; while it is actually 1000*32*6000. Is there a way to have multidimensional object in R, where accessing signals and time frames at a given time becomes easier. like dataset[1007,2], which would be the time frame of the 1007 second and channel 2. The reason I want to access it this way is to compare time frames easily and plot them against each other.
Any answer to any question would be appreciated.
Upvotes: 0
Views: 83
Reputation: 787
This is a good reference for reading large CSV files: https://rpubs.com/msundar/large_data_analysis A key takeaway is to assign the datatype for each column that you are reading versus having the read function decide based on the content.
Upvotes: 0