CS50x Week 8: Mispellings (Problem Set)

This week’s problem set requires us to implement a spell-checker. The usage is speller [dictionary] text, where speller is our program, dictionary is optional, text is a file that contains the paragraphs that need to be spell-checked. If user does not specify a dictionary, a default dictionary will be used.

The main body of the program will be in speller.c. speller.c had been written by the CS50 staff, we just have to implement functions that is called by main function of speller.c. These functions are in dictionary.c. dictionary.h contains declaration of these functions. Do not edit speller.c.

speller.c

In speller.c, load function is called to load dictionary into heap. After that, the text file containing paragraphs of words to be spell-checked is opened. Then, main continue gets character from text file until end of file. When a whole word is met, check function is called to check if the word is in dictionary. If it is not, the misspelled word is printed. Lastly, after all the words are spell-checked, the memory in the heap is freed using unload function.

There are some codes implemented by the CS50 staff to record the time taken for each function to run. Besides the misspelled words, you should see the following information when you run your program:

WORDS MISSPELLED: 644
WORDS IN DICTIONARY: 143091
WORDS IN TEXT: 19190
TIME IN load: 0.17
TIME IN check: 0.03
TIME IN size: 0.00
TIME IN unload: 0.09
TIME IN TOTAL: 0.29

dictionary.c

The four functions that you need to implement in dictionary.c are load, check, size, and unload. You can use either a hash table or trie to store your dictionary in heap. Optimise your algorithm to achieve the fastest running time.

Besides achieving fast running time, you also need to ensure no memory is leaked. Use Valgrind to check memory leaks.

My implementation

Load

I used a trie to store my dictionary. Trie is made up of nodes, each node has a structure:

The isWord boolean is to tell if the node is letter of the end of a word. Each node has an array of 27 pointers (26 for alphabets, 1 for apostrophe), each pointer pointing to another node. Each dictionary word has a maximum length of 47, contains alphabet or apostrophe (apostrophe cannot be at the start of a word).

How do we insert a word into trie? We iterate over each character of the word, for each iteration, we check if struct node pointer of the array of node pointed by cursor pointer is NULL, if it is, we malloc some memory for it. If we arrive at the end of the word, we make isWord true. Cursor pointer is a pointer variable that we initialized to keep track of pointers when we go down the trie.

Check

check returns true if word is in dictionary else false. We iterate over each character of a word to be checked. We go to corresponding element in children, if it is NULL, return false because the word is not in the trie. If it is not NULL, we move to the next character. Once we are at the end of the word, we check if isWord is true, if true, return true as the word is in trie.

Size

size returns number of words in dictionary if loaded else 0 if not yet loaded. This function only needs one line.

Unload

unload unloads dictionary from memory. Returns true if successful else false. We have to free trie from leaves up to the root. I used a recursive function called freeNode that frees the memory of node and every node allocated under it.

It is recursive because, when I pass in the pointer to root node to the function, it frees the root node’s children by iterating over each child to see if they are NULL, if any child node is not NULL, we proceed to call another freeNode on the child to free the child’s children node. Lastly, we free the node itself.

My codes: https://github.com/shaunlgs/CS50x
Other posts in the series: Harvard CS50x 2014

Leave a Reply