Trie Project

Fluff Link to heading

Fourth term at Fanshawe College was a rough term. It took place during the summer and unlike most other terms, 3 courses were done the first half and the other 3 courses were done the other half. I did not like it since twice the content was pushed with only half the time for my mind to soak the information. Though I did have a lot of fun with the Trie Project in Design Patterns 2 with C++; I was strangely addicted to it and got it done before most people. I didn’t have an addiction like it since my start in Warframe.

First Week Link to heading

I spent my first week figuring out how does a Trie look like? I spent a little too much time on it. I started writing a bit of code, pass some unit tests provided by the professor, and then code myself into some sort of wall. I think my initial design was that a trie had an array of nodes and each node had a pointer to another array of nodes and so on. After the next class I asked him what it looked like and this was my final design for a node with some stuff ripped out:

Struct node{
Node* pointerToParent;
Nibble parentIndex;
Value* pointerToValue;
Node* childrenPointers[16];
};

This time the node has an array of children which makes it a bit easier to comprehend instead of the weird pointer to an array of nodes I was doing earlier. The node having an array of children makes it so I can have the destructor call delete on the children so it automatically cleans up when a trie goes out of scope.

The nibble structure is just a simple struct that has one member which is an unsigned int with only 4 bits allocated for it. The syntax for doing it looks like: unsigned int value : 4; Looking back on it, I could have just used binary AND 15 to make sure my values never go past the value 15 instead of making another class to do the truncation for me.

The reason for using a nibble instead of a byte (from a char based string) is because when you use a byte for storing the key, a lot of the 255 bytes is not usually used most of the time and a lot of memory is wasted. The situation becomes worse when Unicode support is thrown in.

The trie class doesn’t have much. Just a bunch of using this_type = more_annoying_syntax and some constants such as:

Const size_type NUM_OF_NIBBLES = BYTE_TO_NIBBLE_FACTOR * sizeof(key_type::value_type)

The above constant is used for looping through a string or wstring nibble by nibble.

I also have a pointer to a node which is the root of the trie. My original implementation also had pointers for begin and end of the trie.

With everything in place I can finally start inserting stuff into the trie and do the first set of unit tests. The navigation logic for looks like:

For each(char_type ch in key)
For(int nibbleNum = NUM_OF_NIBBLES – 1; nibbleNum >= 0; --nibbleNum)
Nibble = ch >> NIBBLE_SIZE * nibbleNum;
// do stuff.

I bit shift the values for the current nibble and the nibble data type will truncate everything else so I don’t have to worry about going out of index bounds. I find it to be pretty neat.

Visual representation after the statement: trieInstance[“bat”] = 5 using my excelent MS Paint skills.

Layout of trie for the entry bat

Each nibble is used as an index to the pointer to the next node.

The last node is what ‘bat’ points to. The value 5 is stored there along with the word “bat” in a std::pair. Each node has a pointer to an std::pair since most nodes are not storing a value. It also makes operations such as move and swap easier since I only have to copy over a pointer to a pair instead of the entire contents of the pair.

Week 2 Link to heading

Week 2 was the week I worked with iterators! It was a fun week. Half the time I spent was debugging my insert and delete function because of if else statements used to determine beginning and end nodes.

The iterators were one of the few times I used recursion in programming; the class mates I talked to used loops.

Looking back, one of the things I am unhappy about was the ’end’ node. There is no formal definition for end, it is just a point that is marked as past the last element in a container. I made my ’end’ as one memory address past the last node so if I wanted to check if I am at end I would have to have a reference to the trie and see if the node is at the end.

I could probably have used nullptr to determine end, but I also use it to determine the root node since the parent pointer of that is nullptr and it would be really confusing if nullptr could mean more than the two things I am using it for (root node and a child index that doesn’t point to anything). Then again, end pointer is the first thing I check before doing the recursive searching so maybe nullptr would have worked well.

Reverse iterator was easy to implement since the standard template library has a wrapper class for reverse iterators; all I had to do was make my trie iterator follow the iterator interface.

Last Day Link to heading

After iterators were complete, copy constructors and other features were very easy to implement; it was a smooth experience.

Submission Week Link to heading

On the week the trie project was due I made one change, begin and end functions will search for the begin and end nodes instead of add and delete functions managing it in if statement blocks. After that was done, not only could I have saved several hours debugging the if statements, I also pass the professor’s unit tests faster. Waaaaahhhhhhh!

Trie Project

Fluff Link to heading

First Week Link to heading

Week 2 Link to heading

Last Day Link to heading

Submission Week Link to heading

Other Notes Link to heading

Nodes Storing Count Link to heading

Pruning Link to heading