Data Mining Problems Report –

Data Mining Problems Report –

Programming –

5. Prove Equation 6.3 in the book. (Hint: First, count the number of ways to create an itemset that forms the left hand side of the rule. Next, for each size k itemset selected for the left-hand side, count the number of ways to choose the remaining d − k items to form the right-hand side of the rule.)

10. Consider the following set of candidate 3-itemsets:

{ t , 2 , 3 } , { r , 2 , 6 } , { 1 , 3 , 4 } , { 2 , 3 , 4 ) , { 2 , 4 , 5 } , { 3 , 4 , 6 } , {4, 5 , 6 }

(a) Construct a hash tree for the above candidate 3-item sets. Assume the tree uses a hash function where all odd-numbered items are hashed to the left child of a node, while the even-numbered items are hashed to the right child. A candidate k-item set is inserted into the tree by hashing on each successive item in the candidate and then following the appropriate branch of the tree according to the hash value. Once a leaf node is reached, the candidate is inserted based on one of the following conditions:

Condition 1: If the depth of the leaf node is equal to k (the root is assumed to be at depth 0), then the candidate is inserted regardless of the number of item sets already stored at the node.

Condition 2: If the depth of the leaf node is less than k, then the candidate can be inserted as long as the number of item sets stored at the node is less than max size. Assume maxsize=2 for this question.

Condition 3: If the depth of the leaf node is less than k and the number of item sets stored at the node is equal to maxsize, then the leaf node is converted into an internal node. New leaf nodes are created

as children of the old leaf node. Candidate item sets previously stored in the old leaf node are distributed to the children based on their hash values. The new candidate is also hashed to its appropriate leaf node.

(b) How many leaf nodes are there in the candidate hash tree? How many

internal nodes are there?

(c) Consider a transaction that contains the following items: {1,2,3,5,6}. Using the hash tree constructed in part (a), which leaf nodes will be checked against the transaction? What are the candidate 3-item sets contained in the transaction?

17. Suppose we have market basket data consisting of 100 transactions and 20 items. If the support for item a is 25%, the support for item b is 90% and the support for item set {a, b} is 20%. Let the support and confidence thresholds be L0% and 60%, respectively.

(a) Compute the confidence of the association rule {a} to {b}. Is the rule interesting according to the confidence measure?

(b) Compute the interest measure for the association pattern {a, b}. Describe the nature of the relationship between item a and item b in terms of the interest measure.

(c) What conclusions can you draw from the results of parts (a) and (b)?

(d) Prove that if the confidence of the rule {a} to {b} is less than the support of {b}, then:

i. e(E{a} to {b}) > c({a} to {b}),

ii. e(E(a) to {b}) > s({b}),

where c() denote the rule confidence and s() denote the support of an item set.