Reputation: 777

Best algorithm to sort exams

I am a grader for a statistics course and have a series of paper homework assignments given to me in random order. Part of my job is to alphabetize them. I have been using a method similar to quick-sort, but other graders have used different methods. I want an efficient sorting method, with justification, for when I have a "large" number of exams, with justification provided.. Here are some specifics I have leveraged:

I have a roster which contains an alphabetized list of all the names I should see.
I don't care to get the names more alphabetized than just the first letter. For example, I am fine if "Smith, John" comes before "Salk, Jonas" .
I will never have to sort more than 300 objects.

My method thus far has been to find the median last letter (ie: if there are 60 papers, pick the last name letter corresponding to the 30th person) of the class roster, treat it as a pivot point, and put all letters above the median in one pile, and all the letters below in another. If a letter is the same as the median, I place it in the median pile. I now do the same thing on the above/below-median piles. When the piles are small enough that there will only be three or four letters in a stack, I make one stack for each letter, then fold the stacks into a master stack, alphabetically.

Are there any algorithms specifically designed for alphabetizing, or something that is more efficient on average than my method? One method that seemed to do okay was to make a stack for each letter (26 piles, worst case), but this consumes so much space that it is not feasible for one desk.

Upvotes: 16

Answers (9)

wildplasser

Reputation: 44250

sort on the first letter into M piles
once you need >= M piles: put all the items with non-matching begin letters on a trash-pile
at the end of the first run: the M piles are complete
recurse, using the leftovers from the trash pile

The constant M can be tuned to match your ability to match&put multiple letters at first sight. (and the avalaible disk space)

In practice you will not need more than a few runs, for reasonable values of M. (Zipf / Pareto law)

Upvotes: 2

Edward Doolittle

Reputation: 4110

From the description of the problem, I believe the most efficient method is similar to Donald Knuth's recommendation for how to sort a deck of cards with two bucket sorts:

Distribute the tests into 26 buckets by first initial.
Gather the piles in reverse order, with the Z pile on top and the A pile on bottom.
Distribute the tests (starting from the top of the single big pile from step 2) into 26 buckets by last initial.
Gather the piles in forward order, with the A pile on top and the Z pile on bottom.

Congratulations, your exams are now sorted according to your specifications. The sort can be done in time O(n), handling each test exactly twice.

It is possible to reduce the number of buckets in step 1 by processing the class list. A boundary between buckets is necessary only for separating first initials which share the same last initial. For example, if a list of initials in the class is AM, BD, LD, RM, CN, BH, you would only need to separate BD from LD and AM from RM in the first pass, so buckets A-K and L-Z would suffice on the first pass. However, for a large class, the reduction in the number of buckets would likely be small, and wouldn't provide much advantage anyway because you have to handle all 300 tests on this step anyway. Furthermore, it complicates the situation, requiring a computer program to process the class last, and makes the situation unstable, requiring a change if the class list changes. Since you're going to need space for approximately 26 piles in step 3 anyway, you might as well sort them into 26 piles in step 1.

You could get the students to do step 1 themselves when they hand the tests in.

Upvotes: 0

Rickard Shen

Reputation: 21

My department has a basic course with typically 500-600 students writing the exam. From a trail & error approach it seems that we get the sorting done the fastest by first doing a bucket sort with roughly one bucket per letter. The letter S can typically be divided into two buckets while the letters in the end of alphabet (x,y,z) can usually share one bucket. We then sort within each bucket by insertion sorting and finish by stacking the buckets.

For small classes (up to around 30) direct insertion sorting is viable, but the time required to find the correct position to insert quickly gets out of hand when the pile grows.

Upvotes: 2

robust

Reputation: 634

This is a great question! We conducted a small experiment to come closer to an answer.

Our set-up consisted of

3 sorters (A, B and C).
3 stacks of 40 student problem sets (one for each sorter). The number of sheets of a problem set ranged from 1 to 5. The sheets were stapled and had student names written on the top of the first page.
3 sorting algorithms to sort the stacks alphabetically:
- Insertion: Take top item from unsorted pile and insert into correct position in sorted pile. Fanning out the sorted pile is allowed.
- Bucket: Sort each item into one of five buckets (A-E, F-J, K-O, P-T, U-Z). Then sort each bucket using insertion sort. Combine sorted buckets.
- Merge: Divide items into 10 piles. Sort each pile using insertion sort. Put 10 sorted piles into 5 pairs. Merge each pair by repeatedly looking at the top items of the pair and putting the alphabetically higher one on the bottom of the pair's resulting pile. After merging 10 piles into 5, merge 2 of the 5 piles, so that there are 4 piles left. Then, repeatedly merge pairwise until a single sorted pile remains.
Measurements:
- Time until completion of the sorting algorithm.
- The number of misplaced items (measured by other sorter).
The order of sorting algorithms was randomized.
Every new round the problem set stacks were exchanged among sorters and shuffled for several minutes.
Sorters A and B each did 9 rounds, sorter C did 3 rounds.
A sheet with the alphabet and bucket sort cutoffs was put up on each sorter's table.

Here is a picture of our set-up.

And here are the results.

Two conclusions are immediate.

The relatively complex merge sort algorithm preformed poorly. Merge sorts consistently took 57 to 125% longer than within sorter bucket / insertion sort averages with no obvious accuracy gains.

We speculate that the initial step of first dividing the stack of problem sets into 10 piles may contribute to merge sort's lackluster results. Future researchers may find that merge-like algorithms combined with more efficient set-up procedures are effective.

Although bucket and insertion sort both performed well, bucket sort was 13 to 25% faster than insertion sort within sorters. This difference corresponds to roughly a minute of time saved for each 40-problem-set sort.

We speculate that the relative efficiency of bucket sort would improve as the number of problem sets to sort grows beyond 40 and that insertion sort would dominate for stacks of 30 or fewer, although more testing is needed. There were no clear differences in accuracy between bucket and insertion sorts.

Lastly, we note that there are important individual differences in sorting ability among our test subjects. Sorter B consistently outperformed sorters A and C by on average 39 and 101 seconds respectively. This suggests that although the sorting procedure employed is important to sorting speed, ability may explain at least as large a share of the variance in individual outcomes. Exploring what makes Germans such fantastic sorters is a promising area for future research.

Upvotes: 7

user1476176

Reputation: 1095

I've based my algorithm on the premise that the time it takes me to determine the proper order for two elements is not consistent. For example, it's easy for me to say that A belongs before D, but takes me to decide whether Q comes before T or vice-versa (generally speaking, the closer the letters are to the end of the alphabet and to each other, the more likely it is that I'll have to mentally recite the alphabet to be sure).

Given this, I find it lessens the tedious alphabet-reciting if I divide the tests into alphabetical "chunks":

Beginning (A-F ish)
Early Middle (G-K ish)
Late Middle (L-P ish)
End (Q-Z ish). This one is bigger because (a) it's the sector where I'm worst at deciding on the order of letters and (b) a few of these letters don't often begin last names

There is overlap - i.e. sometimes I'll feel like a Q is Late Middle and sometimes I'll feel like it's End. It kind of depends on how big the piles are at that point and what letter I last sorted... my theory is that the time saved by not spelling out the alphabet in my head all the time is greater than the extra time spent sorting later on.

That's as far as I've gotten, however. Beyond the initial chunking, I can never really decide what's most efficient...

Upvotes: 2

Parsa

Reputation: 95

I use bucket sort. Use four buckets and again sort each bucket by using another 4-bucket sort, sort each sub-bucket (1/16) by brute-force!

Upvotes: 2

MysticXG

Reputation: 1437

I was looking around at some websites that were talking about algorithms for humans to use, and one that I saw was doing a sort of insertion sort, where you put the one at hand into the pile by putting it directly where it's correct ordering should be.

The inefficiency of this would probably be from having to scan through the pile to find the location as the pile gets bigger, so I'm thinking that to adjust for this, you can add a tag or something that would act as an index for a specific alphabetical location. Since you don't care about the alphabetical ordering aside from the first letter, this would basically put your insertion cost at O(1)

This is just a thought I had while thinking about it, so I have no actually tried it myself, and am unable to say with regards to how effective it would be with sufficiently large piles. But I think that it should work fairly well, as the tags would give you instant access to the location you want to insert.

Upvotes: 2

mpez0

Reputation: 2883

Your last paragraph is insertion sort. If 26 piles are two many, use 24 :). If 26 piles are too many, split the alphabet and the exams into 5 piles. Then sort each pile, again you'll have 5 cases (one with 6).

Upvotes: 1

scibuff

Reputation: 13765

Quicksort is probably not the best as its efficiency depends on the pivot choice. Anyways, with just 300 exams what I'd do is create 26 piles (one for each letter) and just make one pass for all exams putting them into the appropriate piles

Upvotes: 0

Best algorithm to sort exams

Answers (9)

Related Questions