Data Structures!

This part is especially important – you’re going to learn a lot of data structures, and it’s critical that you understand the differences between them and where / when to use them.

0. Arrays / Lists

I’m only mentioning them here for completeness. Of course you know what they are. One thing to mention for those of you who go onto other programming languages, is that in Python we don’t have what other languages would call traditional arrays; we have “arraylists” or “vectors”. In Python, confusingly enough, these vectors are called “lists” or “arrays”.

1. Stacks

You’ve already studied stacks and stack frames when it came to recursion. The key thing to remember is that stacks grow downward and they operate on a last in first out policy (LIFO). The last (most recent) thing placed on the stack is the first to be removed from the stack.

One point of terminology: putting something onto a stack is called pushing onto the stack, and removing something of a stack is called popping from the stack.

It takes O(1) to push onto a stack, and O(1) to pop off a stack.

2. Queues

You are also very familiar with queues. Anytime you’ve waited in “line”, as the Americans say, you are part of a queue. Queues, unlike stacks, are first in first out (FIFO). The first person in line is the first person serviced. They are the opposite of stacks!

Just like a queue, it takes O(1) to push and pop onto/off a queue.

3. Linked List

A linked list is slightly different than a regular list. You cannot index into a linked list. A linked list is made up of nodes, which contain the data. Think of these as little cells that have some data inside of them.

There are two types of linked lists: singly and doubly linked.

In a singly linked list, each node has two values: the data it stores and a next attribute, which points to the next node in the list.

Here is an example of a Node class:

class Node:
    
    def __init__(self, data, next=None):
        self.data = data
        self.next = next

That’s all that’s inside of a node. Here is an example of creating a linked list.

a = Node(4)
a.next = Node(6)
a.next.next = Node(8)

So our linked list has 3 nodes, and the order of their data is 4, 6, 8. Notice our variable a only points to a single node; so we cannot index into the linked list. Instead, we have to use a.next if we want to get the element after a.

There are also doubly linked lists. Here, the nodes have both a next and a prev (previous) instance variables, so we can go backwards as well as forwards through the linked list.

class Node:

    def __init__(self, data, next=None, prev=None):
        self.data = data
        self.prev = prev
        self.next = next

Adding to a linked list and removing from a linked list both take O(1) time. Searching through a linked list, however, takes O(n) time where n is the number of elements in the linked list.

4. Dictionary

A dictionary is a very cool data structure that is used for finding things very quickly. In fact, in most cases (read: average / best case) it can find things you need in O(1)! In the worst case, however, it takes O(n).

How does this work?

Dictionaries use a hash function; a good hash function maps every unique string to a unique number. How it does so is out of the scope of this class; suffice it to say some complex math transforms the group of characters to a unique number.

Think of the dictionary itself as a list of linked lists. So this number returned by the hash function could be anything. How do we take that number and assign it an index? Let’s use the modulus operator. If we mod the output of the hash function by the size of the dictionary, we get an index in which we can store the value!

Let’s say our hash function returned 1000 for the word 'bat'. But the list in the dictionary is only of size 10. So 1000 % 10 is 0, so we assign it to the zeroeth index. But what happens if the word 'cat' produces a hash value of 5000? The indices are the same!

This is why the dictionary is a list of linked lists. We simply add the value to the end of the linked list. Thus, the larger the dictionary, the better the outcome.

When two unique words are assigned the same index, we call that a collision. In the worst case, the dictionary size is 1, so it’s just a linked list. Thus, in the worst case, searching through a dictionary is the same as searching through a linked list, O(n). But this rarely ever happens.

Adding and removing from a dictionary takes O(1) time because it is the same as a linked list.

5. Graph

Graphs are data structures comprised of two things: nodes and edges. Graphs look a lot like those brainstorming diagrams we drew in elementary school where clouds had ideas, and you connected related clouds with simple lines.

source

Except here, isntead of brainstorming diagrams, we have neat circles and straight lines connecting them. Think of them as grown up brainstorming diagrams who have learned obedience. :)

In graphs, edges can either be undirected or directed. Think of directed edges as one-way roads; you can’t travel backwards on them. Undirected edges, on the other hand, are two-way streets.

An example of an undirected graph.

source

Although theoretically the edges could also contain data, in this class, only the nodes will.

6. Tree

Trees are a special kind of graph where you can’t find cycles. What’s a cycle? A cycle is when you can reach the same node by taking a path of unique edges and nodes.* What does that mean? Look at the image above. Imagine you start at 4. Can you reach 4 by taking some route along the graph? Of course you can. Go to 5, then 2, then 3, and voila! You’re back at 4. That’s a cycle.

While a graph is a giant web-like structure, a tree looks like… well, a tree. You never see the root of a tree connecting to the leaves. Like a graph, the edges can be directional or non-directional.

An example of a directed tree (notice the arrows on the edges).

source