# Data Structures And Algorithms Data Structures and Algorithms

22:544:613:SEC03 & 26:198:685SEC03

Homework 4

Instructor: Farid Alizadeh

Due Date: Sunday

Data Structures And Algorithms Data Structures and Algorithms

22:544:613:SEC03 & 26:198:685SEC03

Homework 4

Instructor: Farid Alizadeh

Due Date: Sunday December 12 , 2021, at 11:50PM

Note: This is a strict dealine

last updated on December 4, 2021

Please answer the following questions in electronic form and upload and

submit your files to the Canvas Assignment site before the due date. Make sure

to click on the submit button.

You have two choices for the format.

First Choice: Submit in pdf format all theoretical questions. You may ei-

ther typeset or hand-write your answers. In case of hand-write transform your

work into pdf using apps such as CamScan. You should have only a single

pdf file. For Python scripts, include the text of the script as a separate *.py

file. You should have a single file for Questions 2 & 7. For other questions, if

you use Python to compute your answers include it with the pdf text above.

Second Choice: You can use a single Python Notebook *.ipynb file. You

should use the text format of the notebook for the theoretical results, and the

Python script for the programming parts.

1

22:544:613SEC03 & 26:711:685SEC03, Fall 2021

Homework 4 Due date: 12/12/2021 at 11:50PM

1. Consider the following set of symbols, along with their frequency of oc-

currence in a document:

‘a’ ‘b’ ‘c’ ‘d’ ‘e’

5 10 3 17 24

Step by step show how the Huffman coding will work in this case. Show

the status of the binary tree at each step. Show the final tree and the final

binary encoding of each symbol.

2. The knapsack problem is an optimization problem that attempts to find

the best combination of items. To motivate the problem, assume that a

vendor-hiker can carry a number of items with him for a steep hike. He

plans to sell these items (for example, bottles of water and soda, cookies,

snacks, etc.) The items are numbered 1 through n. Each item i has selling

price pi and a weight of wi. The hiker can only carry with him a total

weight of W kilograms. Also for each item he has only one, so he has

to decide whether to take the item or not. Therefore, the problem is for

item i whether to take it with him (xi = 1) or not (xi = 0.) so that he

maximizes his revenue, without the total weights of items he carries with

hom exceeding his total weight of W. We assume that demand is high

and the hiker can sell all of his items.

This problem can be formulated as an optimization problem as follows:

max p1x1 + · · ·+ pnxn

s.t. w1x1 + · · ·+ wnxn ≤ W

xi ∈ {0, 1}

For example, let us assume that the items are as in the following table:

item: water bottle bottle of soda pack of chips pack of trail mix

value: $4 $3 $2.5 $1.2

weight: 4 3 2 1

And let’s assume that the hiker can carry a total of 5kg. Then we want

to solve how many of each item the hiker wishes to take with him to get

the maximum revenue.

2a) Propose a greedy algorithm for solving this problem. What is the

complexity of your algorithm? Give an example that shows that your

greedy algorithm actually does not always find the optimal solution.

2

22:544:613SEC03 & 26:711:685SEC03, Fall 2021

Homework 4 Due date: 12/12/2021 at 11:50PM

2b) Now develop a dynamic programming approach. To do so, since our

goal is to find the optimal values of x1, x2, . . . , xn, define

V(k, w) = the best solution if we can take only

items 1…k and can have maximum value weight of w

Here by “items 1 … k” we mean some arbitrary order which is set at

the beginning and is fixed throughout the execution of the algorithm.

First, find what is the value V(1, w). In other words, if we could only

carry item 1 with weight w1 and price p1, then how may of this item

would we carry and what would be our revenue?

2c) Now show that the following recurrence relation is correct:

V(k, w) = max

{

V(k − 1, w), V(k, w − wk) + pk

}

In words: The optimal value of using the first k items and weight

limit of w is

• either equal to the optimal value using the first k−1 items (that

is the new kth item is not used at all,)

• or it uses the kth item at least once, so the remainder now is

using the items 1 through k but with only weight limit of w−wk.

This remainder must also be optimal for these parameters (the

dynamic programming principle.)

2d) The recurrence above gives only the optimal value of revenue. To

find the actual value of how many of each item we need to use,

that is to determine the variables xi, we first need to set up an-

other variable x(k, w) indicating whether item k is taken or not, when

maximum weight capacity is w and only the first k items are used.

Then x(k, w) = 1 if item k is used with maximum capacity w, and

x(k, w) = 0 when item k is not used. The decision whether to use

item k or not is made when building the value table V(k, w).

From the values of x(k, w) it is possible to determine x1, x2, . . . , xn.

2e) Using the recurrence for V(k, w), and the definition of x(k, w) devise a

dynamic programming algorithm as follows and apply your algorithm

to the example given above along with the table.

• First, set up a values table to record V(k, w) with rows indexed

from 1 to k. In row i we can only use items 1 through i. The

columns are from 1 to w (or actually from v = mini wi, because if

the hiker’s maximum capacity is less than the lightest item, then

he cannot take anything with him.)

• Then explain how you would fill this table row by row, starting

from row 1. In the table in cell [k, w] you should write both

x(k, w) and the optimal value V(k, w).

3

22:544:613SEC03 & 26:711:685SEC03, Fall 2021

Homework 4 Due date: 12/12/2021 at 11:50PM

p k ↓ w → 1 2 3 4 5

3 1 x, v x, v x, v x, v x, v

3 2 x, v x, v x, v x, v x, v

2.5 3 x, v x, v x, v x, v x, v

1.2 4 x, v x, v x, v x, v x, v

Here, in each cell k, w, the red item is x(k, w) and the black item is

V(k, w)

2f) Explain how from the values of x(k, w) in the table you can recover

the actual values of xi the amount of each item to take. (Hint: Start

from the last cell in the table and work backwards.)

2g) Give a time complexity analysis of the dynamic programming algo-

rithm. Is your algorithm polynomial time?

2h) Write a Python function knapsack(weights, prices, max weight)

which takes a list of weights, a list of prices, and a maximum capacity

and returns an integer list amount which indicates the amount xi of

each item taken, the total revenue rev, the vTble and xTbl the tables

for the x(k, w) and V(k, w). Inside the function you should set up the

tables for V(k, w) and x(k, w) and carry out the computations as

explained above.

3. Consider the following graph with weights on the edges.

4

22:544:613SEC03 & 26:711:685SEC03, Fall 2021

Homework 4 Due date: 12/12/2021 at 11:50PM

s

a

b

c

d

e

f

g

h

1

2

1

3

5

2

8

3

1 6

2

9

3

6

3a) show the adjacency list representation of the graph.

3b) Run Kruskal’s algorithm to find the smallest spanning tree. Assume

the set of edges are sorted in increasing order of weight (so, no heap

is needed.) At each iteration show the status of the forest as repre-

sented by the data structure for union-find operations. Show the tree

representation of sets only in tree format. Also show the status of

this tree representation after every union-find operation.

3c) Now run Prim’s “grow-a-tree” algorithm starting at node ‘s’. At

each iteration make sure that at each stage to show the partially

constructed tree, and the frontier set.

3d) Now apply Dijkstra’s algorithm find all the shortest paths from vertex

‘s’ to all other vertices. Again, at each stage show the shortest path

to the nodes discovered, and highlight the frontier set.

4. Run the Bellman-Ford algorithm to find the shortest path from vertex ‘a’

in the following graph to all other nodes, or show that there is a negative-

length cycle.

At each iteration show the ordered set of edges, and the relabeling and

the predecessor of each vertex. Use the lexical ordering of the edges, for

5

22:544:613SEC03 & 26:711:685SEC03, Fall 2021

Homework 4 Due date: 12/12/2021 at 11:50PM

instance edge a-b comes before b-c etc. Determine if the graph has a

negative cycle or not, and if so, how does the Bellman-Ford algorithm

finds it.

a b c

def

2 -1

1

-3

1

-1 8

3

1

5. Consider the following 2-3 tree:

na

k

mh

g

b

d

f

c e i j l

5a) Insert the letter ‘i’ into this tree, and show the status of the tree step

by step until the final result. At each step also show the corresponding

red-black tree with red links leaning left. Show all the rotations and

change of color on the links which are needed.

5b) Apply DeleteMin operation on the resulting tree. Again show the

status of the 2-3 tree step by step. (No need to show the corresponding

6

22:544:613SEC03 & 26:711:685SEC03, Fall 2021

Homework 4 Due date: 12/12/2021 at 11:50PM

red-black tree operations.) Make sure to show how you backtrack to

eliminate all the illegal “4-nodes” that may have arisen in the process.

6. Consider a hash table of length m = 11 where data are stored using

chaining. Using the multiplicative method, that is the hash function

h(k) = bm(kA mod 1)c with A =

√

5 − 1

2

6a) Insert the keys 1, 2, . . . , 20 into the hash table. Show the final status

of table. (You may write a very short script to find the hash values

of 1,2,. . . ,20. You may also run this homework using a Python script.

However, this is optional. If you do, include your Pythion code as

well.)

6b) Search for number 21, and show the cells you have to search to realize

it is not in the table.

6c) Delete 1 from the table and show the resulting table.

7. Mini-Python project: In this assignment you will run an experiment

to compare several hash functions. Suppose the set of items we are going

to store in the table is the set of integers [1…N]; of course with possible

repetitions. Suppose we allocate a table of length m for this purpose.

7a) The hash functions we are considering are:

h1(k) = bm(kA1 mod 1)c with A1 =

√

5 − 1

2

h2(k) = bm(kA2 mod 1)c with A2 = ln(2)

h3(k) = bm(kA3 mod 1)c with A3 = 0.7

h4(k) = k mod m

7b) Generate N random integers from the range [1, . . . , N] (with repeti-

tion) and put them in a list called L. Experiment with N = 1000, and

m = 11.

7c) Map the numbers in L to an index using the hash functions hi(k)

for i = 1, 2, 3, 4 as above. Put these values in lists called Hi for

i = 1, 2, 3, 4.

7d) Compute the histogram of each array, which produces the frequency

count of each index using the numpy.histogram function.

7

22:544:613SEC03 & 26:711:685SEC03, Fall 2021

Homework 4 Due date: 12/12/2021 at 11:50PM

7e) Plot each of the four histograms using matplotlib.pyplot function

hist.

7f) Compare these four histograms and see if they all “look” reasonably

uniform. Does any of them behave poorly?

8