MATLAB function: improving speed/performance of code

I am attempting to write a function (see code below) that will create a list of airports at which more than 1,000 flights either departed or arrived, then provide a vector that contains the indices of each of those airports in (x,y) format. Both the airport and flight data are housed in separate structs of dimensions 1x6267 and 1x136010, respectively, and every airport and flight has a unique ID number that corresponds to its column in the struct. Although the code I wrote does succeed in identifying all of the airports with 1000+ combined arrivals/departures, the function takes almost 20 mins to execute and only provides the ID numbers of the airports, not their indices in the airports struct. What changes do I need to make to the code in order to get its run time under 5 mins and how can I create a vector of (x,y) indices for the airports? Any help would be greatly appreciated! Thanks!

P.S. I'm rather new to MATLAB, so I apologize in advance if these questions seems silly or obvious.

function list = Problem10( flights, aircraft, airlines, airports )
a=zeros(1,(length(airports)));
for id=1:length(airports)
    a(id)= FlightsToFrom(flights, id);
end
a(a==0) = [];
[list]=[sort(a)]
end

function tofrom = FlightsToFrom(flights, ID)
sum=0;
for ii=1:length(flights)
    if (isequal (flights(1,ii).from_id, ID))||(isequal (flights(1,ii).to_id, ID))
        sum=sum+1;
        if sum > 1000
            break;
        end
    end
end
if sum <=1000
    tofrom=0;
else
    tofrom=ID;
end
end

Here are some examples of what the database looks/behaves like:

(In Workspace)
airports <1x6267 struct>
aircraft <1x384 struct>
airlines <1x1559 struct>
flights <1x136010 struct>

(Inside of struct)
flights(1,6) <1x1 struct>
**Field**       **Value**
airline_id    60
from_id       967
to_id         6252
aircraft_id   18
distance      32
airtime       19
passengers    0
month         1

flights(1,6).from_id <1x1 double>
967


airport(1,176) <1x1 struct>
**Field**       **Value**
code          'AEX'
name          'Alexandria, LA: Alexandria International'

airport(1,176).name <1x40 char>
'Alexandria, LA: Alexandria International'

(In Command Window)

>> airports (2866)

ans = 

    code: 'LAX'
    name: 'Los Angeles, CA: Los Angeles International'

>> airports (1, 2866)

ans = 

    code: 'LAX'
    name: 'Los Angeles, CA: Los Angeles International'

>> airports (4703)

ans = 

    code: 'SEA'
    name: 'Seattle, WA: Seattle/Tacoma International'

>> airports (1, 4703)

ans = 

    code: 'SEA'
    name: 'Seattle, WA: Seattle/Tacoma International'

>> flights (4736)

ans = 

     airline_id: 31
        from_id: 1635
          to_id: 1062
    aircraft_id: 194
       distance: 118
        airtime: 1792
     passengers: 1657
          month: 1

>> flights (1, 4736)

ans = 

     airline_id: 31
        from_id: 1635
          to_id: 1062
    aircraft_id: 194
       distance: 118
        airtime: 1792
     passengers: 1657
          month: 1

>> flights(1,7369).to_id

ans =

   830

>> flights(1,7369).from_id

ans =

        1047

Upvotes: 0

Answers (3)

Yair Altman

Reputation: 5722

Your entire FlightsToFrom() function can be fully vectorized, leading to the following (with a few additional small improvements):

function a = Problem10( flights, aircraft, airlines, airports )
    numAirports = length(airports);
    a(numAirports) = 0;  % preallocate: faster than zeros()
    from_ids = [flights.from_id];
    to_ids   = [flights.to_id];
    for id = 1 : numAirports
        a(id) = id * (sum(from_ids==id | to_ids==id) > 1000);
    end
    a(~a) = [];
    %a = sort(a);  % unnecessary - a is already sorted at this stage, by ascending ID!
end

This can probably be further vectorized, but I think that the resulting speedup from just these small changes should be sufficient to make any further investment in performance tuning an academic rather than a practical concern.

Upvotes: 2

Daniel

Reputation: 36710

To speed up, I would suggest

function tofrom = FlightsToFrom(flights, ID)
assert(~exist('sum','var'))
nflights=sum([flights.fromId]==ID)+sum([flights.toId]==ID);
if nflights <=1000
    tofrom=0;
else
    tofrom=ID;
end
end

For your second question, how does airports look like? Currently you are using your loop index, is this identical to the airport id?

If possible, provide some example data. For example such a pice of code to generate random input data which matches your real data:

numOfAirports=100
numOfFlights=10000
for idx=1:numOfFlights
    flights(idx).fromId=randi(numOfAirports);flights(idx).toId=randi(numOfAirports);
end

Upvotes: 0

zetches

Reputation: 148

What takes time in your code is looping through the entire flight list 6267 times, as you call the FlightsToFrom function for every single airport... If you can solve it with just one looping through the flights, the program will run 6267 times faster, which is a fraction of a second. I'm not exactly sure what you mean by "a vector of indices x,y for the airports" -- what should this vector contain? In any case, you could do the looping at least this way:

a=zeros(length(airports),length(airports));
for id = 1:length(flights)
     a[flights(1,id).from_id, flights(1,id).to_id] = a[flights(1,id).from_id, flights(1,id).to_id]  + 1; 
end

At the end of this, a is a matrix that contains the number of times a flight took place from the different airports. Eg. a[5,6] will be the number of times there was a flight from airport 5 to airport 6. You can then do stuff to this matrix using MATLAB's in-built functions, eg.

[row, col] = find(a>1000);

will give you the coordinates of the to/from of the flights that happened more than 1000 times.

highdepartures = find(sum(a,1)>1000)); 
highdarrivals = find(sum(a,2)>1000));

will give you a list of the coordinates of the airports with the highest number of departures/arrivals, respectively.

Upvotes: 0

MATLAB function: improving speed/performance of code

Answers (3)

Related Questions