rodrigo guevara
rodrigo guevara

Reputation: 39

storing day to day data from matrix in another matrix octave / matlab

I have this huge matrix about 800k rows. (This is just a little piece so you can see how it is made) with 24 hours data from every day, every month of a whole year, about 7 toll stations around a city. I need to store in a new matrix, how many cars paid in cash and how many used the electronic toll device,day to day, the 365 days and then graph the whole thing. In this case, i don't have to discriminate between the tolls so I know I'll need columns 1,2 8 & 9. 101 means cash and 106 toll pass but honestly I don't know how to operate with such a big matrix, I'm kinda new using octave/matlab and programming in general, so thank you very much for any advice

#Month  Day Hour    weekday(1to7)   TollStation Direction   Vehicle type    method of payment   Amount of vehicles
1   1   0   3   1   1   1   106 6
1   1   0   3   1   2   1   106 18
2   4   0   3   2   1   1   101 16
2   5   0   3   2   1   1   106 159
3   17  0   3   2   1   2   106 5
4   15  0   3   2   2   1   101 12
5   19  0   3   2   2   1   106 182
6   1   0   3   3   1   1   106 98
7   1   0   3   3   1   2   106 6
8   1   0   3   3   2   1   106 67
9   1   0   3   3   2   2   106 6
10  1   0   3   4   1   1   106 59
11  1   0   3   4   1   2   106 1
12  1   0   3   4   2   1   106 106

enter image description here

EDIT: im opening the file like this:

file=fopen('FlujoVehicular2019.txt'); %open file
arreglo=fscanf(file, '%i',[9,812513]); %reads file
fclose(file); %close file
M = arreglo';

[nRow, ~] = size(M);


elec_rows=find(M(:,1)==1 & M(:,2)==1 & M(:,8)==10); %filters month 1, day 1 electronic payments

>a = sum(M(elec_rows,9)); %sums all electronic payments from month 1 day 1

>disp(a)

Now i need to store this data somewhere and then move on to month 1 day 2, month 1 day 3 and so on. How can i do that? Thanks again

Upvotes: 1

Views: 76

Answers (1)

Nick J
Nick J

Reputation: 1610

So, 800k x 9 is big, but it's not 'that big'. Matlab/Octave should have little trouble dealing with that much data.

For example - on Octave, creating a random 800,000 x 9 array:

>> a = rand(800000,9);

>> whos 
Variables visible from the current scope:

variables in scope: top scope

   Attr Name        Size                     Bytes  Class
   ==== ====        ====                     =====  =====
        a      800000x9                   57600000  double

Total is 7200000 elements using 57600000 bytes

took no measurable time. Saving the data as text to disk, however, using

>> csvwrite('testdata.dat', a);

and

>> b = csvread('testdata.dat');

created a 135MB file and each took several minutes. Still quite manageable. There are also a number of file i/o functions that may be faster than the ones i used above.

so step 1 is to read in your data. There are a number of functions for doing this, see the Octave manual on Simple File I/O. the main issue in using these simple functions is your header row, and from the data you pasted it has multiple characters as whitespace. If it was a singe space, tab, or comma, it would be simple. The following works, using dlmread and an empty input [] as delimiter so that the function figures it out for itself, and specifying to skip 1 row and 0 columns of the data:

>> mydata = dlmread('testdata2.txt',[], 1, 0);  
warning: implicit conversion from null_matrix to sq_string
mydata =

     1     1     0     3     1     1     1   106     6
     1     1     0     3     1     2     1   106    18
     2     4     0     3     2     1     1   101    16
     2     5     0     3     2     1     1   106   159
     3    17     0     3     2     1     2   106     5
     4    15     0     3     2     2     1   101    12
     5    19     0     3     2     2     1   106   182
     6     1     0     3     3     1     1   106    98
     7     1     0     3     3     1     2   106     6
     8     1     0     3     3     2     1   106    67
     9     1     0     3     3     2     2   106     6
    10     1     0     3     4     1     1   106    59
    11     1     0     3     4     1     2   106     1
    12     1     0     3     4     2     1   106   106

(the warning is just about it dealing with the empty input for delimiter. Again, it would be cleaner with a simpler delimiter.) There are other functions that would work as well that involve defining input patterns for the function to recognize, but dlmread seems to do the trick.

After the data is read into the program many of the other things you said are fairly straightforward data manipulation once you learn how to work with different functions and how matlab/octave handle indexing and logical operators to extract data subsets. Working with the entire 800k line dataset is no different that working with the small dataset you included above.

For one example - to get how many cars paid cash or electronic in each month:

>> elec_rows=find(mydata(:,8)==106)
elec_rows =

    1
    2
    4
    5
    7
    8
    9
   10
   11
   12
   13
   14

>> cash_rows=find(mydata(:,8)==101)
cash_rows =

   3
   6

>> sum(mydata(elec_rows,9))
ans = 713

>> sum(mydata(cash_rows,9))
ans = 28

and a plotting example:

>> plot(mydata(elec_rows,1), mydata(elec_rows,9), mydata(cash_rows,1), mydata(cash_rows,9));
>> xlabel('month');ylabel('cars');title('cars per month');
>> legend('electronic payment', 'cash payment');

example data plotting

(it's not quite right for month 1 since there are two entries for electronic payment on month 1, but you could do different things to sum data by month, etc, before plotting.

Upvotes: 1

Related Questions