To write and use a Dictionary implementation.
Download Dictionary.java, DictPair.java, and Set.java.
This is optionally a partner assignment.
Write a hash-table–based Dictionary implementation, using separate chaining, called HashDictionaryImplementation
. A few notes:
JavaSetWrapper
.DictPair
interface, in order to return the correct type of object for getEntrySet()
.In order to implement separate chaining, you'll also need to write your own private inner class for nodes. This can get really ugly when it comes to dealing with Java's funky handling of generics, so here's what you should do:
Object
s, even though it's actually an array of nodes.Object
by default). You still need to precede the assignment with @SuppressWarnings("unchecked")
in order to avoid compiler warnings.I've written some example code to demonstrate how this should work.
When it comes to resizing your dictionary, your table size must always be prime, and should approximately double every time you resize (or otherwise the amortized cost of resizing is no longer constant). The right way to do this is to just store a precomputed sequence of prime table sizes, worked out in advance by some smart person, as a static constant. Here's the code for that (make sure to attribute your sources!):
// Nice prime numbers for growing array sizes, from // http://planetmath.org/goodhashtableprimes // (augmented with a final entry computed by Wolfram Alpha). private static final int[] PRIMES = { 13, 23, 53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317, 196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843, 50331653, 100663319, 201326611, 402653189, 805306457, 1610612741, 3221225473 };
Note that the final value (which I computed myself) is almost equal to $1.5 \cdot 2^{31}$. This is ¾ of the way to $2^{32} - 1$, which is the maximum value of an unsigned integer, and therefore the maximum possible size of an array on a 32-bit machine architecture. On a 32-bit architecture, Java uses at least 64 bits to store each Integer (32 for the value, and 32 more for the reference to the value), so you'd run out of memory before you could fill an array of size 3221225473. Therefore this value is definitely big enough.
HashDictionaryImplementation
to test the functionality of your Dictionary implementation. In particular, make sure that you test edge cases, and that you add enough data at some point to force your hash table to grow.Use your Dictionary to analyze the text of a document.
Write a separate class, WordFreqs
, whose main method takes a filename from the command line: a text file to analyze. Read through the file word-by-word, ignoring case, punctuation, etc. just like in the last assignment (so “Don't”, for example, would be treated as two words: “don” and “t”). Use your dictionary to keep track of the number of occurrences of each unique word in the file.
Then print out the top 20 most common words in the file, each paired with its normalized frequency: the proportion of all the words in the file that were occurrences of this particular word. (So, for example, if “don” shows up 5 times in a file with 500 words, its normalized frequency is 0.01.)
Make this a polished program: it should fail gracefully and provide helpful feedback to the user where necessary.
Zip up your HashDictionaryImplementation.java and WordFreqs.java (and any additional ADT or implementation files needed for either of these) into hw18.zip and submit it on Moodle.