Reputation: 777
I am a grader for a statistics course and have a series of paper homework assignments given to me in random order. Part of my job is to alphabetize them. I have been using a method similar to quick-sort, but other graders have used different methods. I want an efficient sorting method, with justification, for when I have a "large" number of exams, with justification provided.. Here are some specifics I have leveraged:
My method thus far has been to find the median last letter (ie: if there are 60 papers, pick the last name letter corresponding to the 30th person) of the class roster, treat it as a pivot point, and put all letters above the median in one pile, and all the letters below in another. If a letter is the same as the median, I place it in the median pile. I now do the same thing on the above/below-median piles. When the piles are small enough that there will only be three or four letters in a stack, I make one stack for each letter, then fold the stacks into a master stack, alphabetically.
Are there any algorithms specifically designed for alphabetizing, or something that is more efficient on average than my method? One method that seemed to do okay was to make a stack for each letter (26 piles, worst case), but this consumes so much space that it is not feasible for one desk.
Upvotes: 16
Views: 6467
Reputation: 44250
M
pilesM
piles: put all the items with non-matching begin letters on a trash-pileM
piles are completeThe constant M
can be tuned to match your ability to match&put multiple letters at first sight. (and the avalaible disk space)
In practice you will not need more than a few runs, for reasonable values of M
. (Zipf / Pareto law)
Upvotes: 2
Reputation: 4110
From the description of the problem, I believe the most efficient method is similar to Donald Knuth's recommendation for how to sort a deck of cards with two bucket sorts:
Congratulations, your exams are now sorted according to your specifications. The sort can be done in time O(n), handling each test exactly twice.
It is possible to reduce the number of buckets in step 1 by processing the class list. A boundary between buckets is necessary only for separating first initials which share the same last initial. For example, if a list of initials in the class is AM, BD, LD, RM, CN, BH, you would only need to separate BD from LD and AM from RM in the first pass, so buckets A-K and L-Z would suffice on the first pass. However, for a large class, the reduction in the number of buckets would likely be small, and wouldn't provide much advantage anyway because you have to handle all 300 tests on this step anyway. Furthermore, it complicates the situation, requiring a computer program to process the class last, and makes the situation unstable, requiring a change if the class list changes. Since you're going to need space for approximately 26 piles in step 3 anyway, you might as well sort them into 26 piles in step 1.
You could get the students to do step 1 themselves when they hand the tests in.
Upvotes: 0
Reputation: 21
My department has a basic course with typically 500-600 students writing the exam. From a trail & error approach it seems that we get the sorting done the fastest by first doing a bucket sort with roughly one bucket per letter. The letter S can typically be divided into two buckets while the letters in the end of alphabet (x,y,z) can usually share one bucket. We then sort within each bucket by insertion sorting and finish by stacking the buckets.
For small classes (up to around 30) direct insertion sorting is viable, but the time required to find the correct position to insert quickly gets out of hand when the pile grows.
Upvotes: 2
Reputation: 634
This is a great question! We conducted a small experiment to come closer to an answer.
Our set-up consisted of
3 sorters (A, B and C).
3 stacks of 40 student problem sets (one for each sorter). The number of sheets of a problem set ranged from 1 to 5. The sheets were stapled and had student names written on the top of the first page.
3 sorting algorithms to sort the stacks alphabetically:
Measurements:
The order of sorting algorithms was randomized.
Every new round the problem set stacks were exchanged among sorters and shuffled for several minutes.
Sorters A and B each did 9 rounds, sorter C did 3 rounds.
A sheet with the alphabet and bucket sort cutoffs was put up on each sorter's table.
Here is a picture of our set-up.
And here are the results.
Two conclusions are immediate.
We speculate that the initial step of first dividing the stack of problem sets into 10 piles may contribute to merge sort's lackluster results. Future researchers may find that merge-like algorithms combined with more efficient set-up procedures are effective.
We speculate that the relative efficiency of bucket sort would improve as the number of problem sets to sort grows beyond 40 and that insertion sort would dominate for stacks of 30 or fewer, although more testing is needed. There were no clear differences in accuracy between bucket and insertion sorts.
Lastly, we note that there are important individual differences in sorting ability among our test subjects. Sorter B consistently outperformed sorters A and C by on average 39 and 101 seconds respectively. This suggests that although the sorting procedure employed is important to sorting speed, ability may explain at least as large a share of the variance in individual outcomes. Exploring what makes Germans such fantastic sorters is a promising area for future research.
Upvotes: 7
Reputation: 1095
I've based my algorithm on the premise that the time it takes me to determine the proper order for two elements is not consistent. For example, it's easy for me to say that A belongs before D, but takes me to decide whether Q comes before T or vice-versa (generally speaking, the closer the letters are to the end of the alphabet and to each other, the more likely it is that I'll have to mentally recite the alphabet to be sure).
Given this, I find it lessens the tedious alphabet-reciting if I divide the tests into alphabetical "chunks":
There is overlap - i.e. sometimes I'll feel like a Q is Late Middle and sometimes I'll feel like it's End. It kind of depends on how big the piles are at that point and what letter I last sorted... my theory is that the time saved by not spelling out the alphabet in my head all the time is greater than the extra time spent sorting later on.
That's as far as I've gotten, however. Beyond the initial chunking, I can never really decide what's most efficient...
Upvotes: 2
Reputation: 95
I use bucket sort. Use four buckets and again sort each bucket by using another 4-bucket sort, sort each sub-bucket (1/16) by brute-force!
Upvotes: 2
Reputation: 1437
I was looking around at some websites that were talking about algorithms for humans to use, and one that I saw was doing a sort of insertion sort, where you put the one at hand into the pile by putting it directly where it's correct ordering should be.
The inefficiency of this would probably be from having to scan through the pile to find the location as the pile gets bigger, so I'm thinking that to adjust for this, you can add a tag or something that would act as an index for a specific alphabetical location. Since you don't care about the alphabetical ordering aside from the first letter, this would basically put your insertion cost at O(1)
This is just a thought I had while thinking about it, so I have no actually tried it myself, and am unable to say with regards to how effective it would be with sufficiently large piles. But I think that it should work fairly well, as the tags would give you instant access to the location you want to insert.
Upvotes: 2
Reputation: 2883
Your last paragraph is insertion sort. If 26 piles are two many, use 24 :). If 26 piles are too many, split the alphabet and the exams into 5 piles. Then sort each pile, again you'll have 5 cases (one with 6).
Upvotes: 1
Reputation: 13765
Quicksort is probably not the best as its efficiency depends on the pivot choice. Anyways, with just 300 exams what I'd do is create 26 piles (one for each letter) and just make one pass for all exams putting them into the appropriate piles
Upvotes: 0