Reputation: 3
I am attempting to write a function (see code below) that will create a list of airports at which more than 1,000 flights either departed or arrived, then provide a vector that contains the indices of each of those airports in (x,y) format. Both the airport and flight data are housed in separate structs of dimensions 1x6267 and 1x136010, respectively, and every airport and flight has a unique ID number that corresponds to its column in the struct. Although the code I wrote does succeed in identifying all of the airports with 1000+ combined arrivals/departures, the function takes almost 20 mins to execute and only provides the ID numbers of the airports, not their indices in the airports struct. What changes do I need to make to the code in order to get its run time under 5 mins and how can I create a vector of (x,y) indices for the airports? Any help would be greatly appreciated! Thanks!
P.S. I'm rather new to MATLAB, so I apologize in advance if these questions seems silly or obvious.
function list = Problem10( flights, aircraft, airlines, airports )
a=zeros(1,(length(airports)));
for id=1:length(airports)
a(id)= FlightsToFrom(flights, id);
end
a(a==0) = [];
[list]=[sort(a)]
end
function tofrom = FlightsToFrom(flights, ID)
sum=0;
for ii=1:length(flights)
if (isequal (flights(1,ii).from_id, ID))||(isequal (flights(1,ii).to_id, ID))
sum=sum+1;
if sum > 1000
break;
end
end
end
if sum <=1000
tofrom=0;
else
tofrom=ID;
end
end
Here are some examples of what the database looks/behaves like:
(In Workspace)
airports <1x6267 struct>
aircraft <1x384 struct>
airlines <1x1559 struct>
flights <1x136010 struct>
(Inside of struct)
flights(1,6) <1x1 struct>
**Field** **Value**
airline_id 60
from_id 967
to_id 6252
aircraft_id 18
distance 32
airtime 19
passengers 0
month 1
flights(1,6).from_id <1x1 double>
967
airport(1,176) <1x1 struct>
**Field** **Value**
code 'AEX'
name 'Alexandria, LA: Alexandria International'
airport(1,176).name <1x40 char>
'Alexandria, LA: Alexandria International'
(In Command Window)
>> airports (2866)
ans =
code: 'LAX'
name: 'Los Angeles, CA: Los Angeles International'
>> airports (1, 2866)
ans =
code: 'LAX'
name: 'Los Angeles, CA: Los Angeles International'
>> airports (4703)
ans =
code: 'SEA'
name: 'Seattle, WA: Seattle/Tacoma International'
>> airports (1, 4703)
ans =
code: 'SEA'
name: 'Seattle, WA: Seattle/Tacoma International'
>> flights (4736)
ans =
airline_id: 31
from_id: 1635
to_id: 1062
aircraft_id: 194
distance: 118
airtime: 1792
passengers: 1657
month: 1
>> flights (1, 4736)
ans =
airline_id: 31
from_id: 1635
to_id: 1062
aircraft_id: 194
distance: 118
airtime: 1792
passengers: 1657
month: 1
>> flights(1,7369).to_id
ans =
830
>> flights(1,7369).from_id
ans =
1047
Upvotes: 0
Views: 178
Reputation: 5722
Your entire FlightsToFrom()
function can be fully vectorized, leading to the following (with a few additional small improvements):
function a = Problem10( flights, aircraft, airlines, airports )
numAirports = length(airports);
a(numAirports) = 0; % preallocate: faster than zeros()
from_ids = [flights.from_id];
to_ids = [flights.to_id];
for id = 1 : numAirports
a(id) = id * (sum(from_ids==id | to_ids==id) > 1000);
end
a(~a) = [];
%a = sort(a); % unnecessary - a is already sorted at this stage, by ascending ID!
end
This can probably be further vectorized, but I think that the resulting speedup from just these small changes should be sufficient to make any further investment in performance tuning an academic rather than a practical concern.
Upvotes: 2
Reputation: 36710
To speed up, I would suggest
function tofrom = FlightsToFrom(flights, ID)
assert(~exist('sum','var'))
nflights=sum([flights.fromId]==ID)+sum([flights.toId]==ID);
if nflights <=1000
tofrom=0;
else
tofrom=ID;
end
end
For your second question, how does airports look like? Currently you are using your loop index, is this identical to the airport id?
If possible, provide some example data. For example such a pice of code to generate random input data which matches your real data:
numOfAirports=100
numOfFlights=10000
for idx=1:numOfFlights
flights(idx).fromId=randi(numOfAirports);flights(idx).toId=randi(numOfAirports);
end
Upvotes: 0
Reputation: 148
What takes time in your code is looping through the entire flight list 6267 times, as you call the FlightsToFrom function for every single airport... If you can solve it with just one looping through the flights, the program will run 6267 times faster, which is a fraction of a second. I'm not exactly sure what you mean by "a vector of indices x,y for the airports" -- what should this vector contain? In any case, you could do the looping at least this way:
a=zeros(length(airports),length(airports));
for id = 1:length(flights)
a[flights(1,id).from_id, flights(1,id).to_id] = a[flights(1,id).from_id, flights(1,id).to_id] + 1;
end
At the end of this, a
is a matrix that contains the number of times a flight took place from the different airports. Eg. a[5,6] will be the number of times there was a flight from airport 5 to airport 6. You can then do stuff to this matrix using MATLAB's in-built functions, eg.
[row, col] = find(a>1000);
will give you the coordinates of the to/from of the flights that happened more than 1000 times.
highdepartures = find(sum(a,1)>1000));
highdarrivals = find(sum(a,2)>1000));
will give you a list of the coordinates of the airports with the highest number of departures/arrivals, respectively.
Upvotes: 0