Reputation: 39
I have this huge matrix about 800k rows. (This is just a little piece so you can see how it is made) with 24 hours data from every day, every month of a whole year, about 7 toll stations around a city. I need to store in a new matrix, how many cars paid in cash and how many used the electronic toll device,day to day, the 365 days and then graph the whole thing. In this case, i don't have to discriminate between the tolls so I know I'll need columns 1,2 8 & 9. 101 means cash and 106 toll pass but honestly I don't know how to operate with such a big matrix, I'm kinda new using octave/matlab and programming in general, so thank you very much for any advice
#Month Day Hour weekday(1to7) TollStation Direction Vehicle type method of payment Amount of vehicles
1 1 0 3 1 1 1 106 6
1 1 0 3 1 2 1 106 18
2 4 0 3 2 1 1 101 16
2 5 0 3 2 1 1 106 159
3 17 0 3 2 1 2 106 5
4 15 0 3 2 2 1 101 12
5 19 0 3 2 2 1 106 182
6 1 0 3 3 1 1 106 98
7 1 0 3 3 1 2 106 6
8 1 0 3 3 2 1 106 67
9 1 0 3 3 2 2 106 6
10 1 0 3 4 1 1 106 59
11 1 0 3 4 1 2 106 1
12 1 0 3 4 2 1 106 106
EDIT: im opening the file like this:
file=fopen('FlujoVehicular2019.txt'); %open file
arreglo=fscanf(file, '%i',[9,812513]); %reads file
fclose(file); %close file
M = arreglo';
[nRow, ~] = size(M);
elec_rows=find(M(:,1)==1 & M(:,2)==1 & M(:,8)==10); %filters month 1, day 1 electronic payments
>a = sum(M(elec_rows,9)); %sums all electronic payments from month 1 day 1
>disp(a)
Now i need to store this data somewhere and then move on to month 1 day 2, month 1 day 3 and so on. How can i do that? Thanks again
Upvotes: 1
Views: 76
Reputation: 1610
So, 800k x 9 is big, but it's not 'that big'. Matlab/Octave should have little trouble dealing with that much data.
For example - on Octave, creating a random 800,000 x 9 array:
>> a = rand(800000,9);
>> whos
Variables visible from the current scope:
variables in scope: top scope
Attr Name Size Bytes Class
==== ==== ==== ===== =====
a 800000x9 57600000 double
Total is 7200000 elements using 57600000 bytes
took no measurable time. Saving the data as text to disk, however, using
>> csvwrite('testdata.dat', a);
and
>> b = csvread('testdata.dat');
created a 135MB file and each took several minutes. Still quite manageable. There are also a number of file i/o functions that may be faster than the ones i used above.
so step 1 is to read in your data. There are a number of functions for doing this, see the Octave manual on Simple File I/O. the main issue in using these simple functions is your header row, and from the data you pasted it has multiple characters as whitespace. If it was a singe space, tab, or comma, it would be simple. The following works, using dlmread
and an empty input []
as delimiter so that the function figures it out for itself, and specifying to skip 1 row and 0 columns of the data:
>> mydata = dlmread('testdata2.txt',[], 1, 0);
warning: implicit conversion from null_matrix to sq_string
mydata =
1 1 0 3 1 1 1 106 6
1 1 0 3 1 2 1 106 18
2 4 0 3 2 1 1 101 16
2 5 0 3 2 1 1 106 159
3 17 0 3 2 1 2 106 5
4 15 0 3 2 2 1 101 12
5 19 0 3 2 2 1 106 182
6 1 0 3 3 1 1 106 98
7 1 0 3 3 1 2 106 6
8 1 0 3 3 2 1 106 67
9 1 0 3 3 2 2 106 6
10 1 0 3 4 1 1 106 59
11 1 0 3 4 1 2 106 1
12 1 0 3 4 2 1 106 106
(the warning is just about it dealing with the empty input for delimiter. Again, it would be cleaner with a simpler delimiter.) There are other functions that would work as well that involve defining input patterns for the function to recognize, but dlmread
seems to do the trick.
After the data is read into the program many of the other things you said are fairly straightforward data manipulation once you learn how to work with different functions and how matlab/octave handle indexing and logical operators to extract data subsets. Working with the entire 800k line dataset is no different that working with the small dataset you included above.
For one example - to get how many cars paid cash or electronic in each month:
>> elec_rows=find(mydata(:,8)==106)
elec_rows =
1
2
4
5
7
8
9
10
11
12
13
14
>> cash_rows=find(mydata(:,8)==101)
cash_rows =
3
6
>> sum(mydata(elec_rows,9))
ans = 713
>> sum(mydata(cash_rows,9))
ans = 28
and a plotting example:
>> plot(mydata(elec_rows,1), mydata(elec_rows,9), mydata(cash_rows,1), mydata(cash_rows,9));
>> xlabel('month');ylabel('cars');title('cars per month');
>> legend('electronic payment', 'cash payment');
(it's not quite right for month 1 since there are two entries for electronic payment on month 1, but you could do different things to sum data by month, etc, before plotting.
Upvotes: 1