Data Structure - Ch1 樹與二元樹 Tree and Binary Tree

2015.1.17 初版
2017.9.7 二版

一、Tree

樹為圖的一種 ($G=(V, E)$，後面章節會提。)，是由 1 個以上的節點所組成的有限集合，滿足至少有一個節點，稱為根，其餘的節點分成 $T_1, T_2, \dots, T_n$ 個互斥集合，稱為子樹。(此定義為遞迴定義，且上述定義隱含樹不可為空，子樹間沒有交集。)

名詞解釋：

Degree of a node : 節點的子樹個數
Degree of a tree : 各節點 degree 的最大值
Leaf : degree 為 0 的節點。
Non-terminal Node : 樹中所有非葉子的節點。
Sibling : 同一個父節點的所有子節點互稱為 sibling。
Ancestor : 某一個節點的祖先，乃是從樹根到該節點路徑中，所經過的所有節點。
Level : 自樹根至該節點的距離。(設根所在的 level 値為 1 或 0。)
Height (Depth) : 一顆樹的各 level 値當中之最大値，即為該樹的高度。
Forest : n 個互斥樹所形成的集合，可以為空。

樹的資料結構表示法

原始做法為直接用 link-list 表示。但會因為空 link 數目太多而浪費空間 (tree degree 愈多浪費比例愈高)。

Lemma : If T is a k-ary tree (a tree of degree k) with n nodes, each having a fixed size (k) linked (child) fields, then n(k − 1)+1 of the nk child fields are 0.

感覺：準備了n*k條，每個node除了root都要被接，因此用了n-1條，n*k-(n-1)條都浪費了
改善方法：將樹化成 binary tree，其中一個方法為 left child-right sibling。

待補：Generalized list

二、Binary Tree

binary tree 為具有 $\geq$ 0 個節點所構成的有限集合。

binary tree 可以為空的樹。
若不為空的樹，則具有 root 及左右子樹，且左右子樹亦是 binary tree。

(表示各節點之 degree 介於 0 與 2 之間，左右子樹有次序之分。)

1. binary tree 常識

二元樹中，第 i 個 level 的節點個數最多有 $2^{i-1}$ 個。
高度 h 的二元樹節點個數最多有 $2^h - 1$ 個。
非空二元樹若 leaf 個數為 $n_0$ 個，degree 為 2 的節點個數為 $n_2$ 個，則 $n_0 = n_2 + 1$。

(1.) (2.) 使用 induction 證明，(3.) 證明如下：

$n = n_0 + n_1 + n_2$ // 節點總數為各種 degree 的節點之合
$n = n_1 + 2n_2 + 1$ // 節點總數為branch數目再+1 (root)
因此，$n_0 = n_2 + 1$

2. 二元樹的種類

Skewed binary tree : 毎個 non-leaf node 皆只有左或右子節點。
Full binary tree : 具有最多節點個數的二元樹。
Complete binary tree : n 個節點之編號與高度 k 的 full binary tree 的前 n 個節點編號對應，不能跳號。

3. Complete binary tree 的常識

假設 Complete binary tree 有 n 個node (編號 1~n)，其中某個 node 其編號為 i，則：

其左兒子編號為 2i (若 2i > n，則左兒子不存在。)
其右兒子編號為 2i+1 (若 2i+1 > n，則右兒子不存在。)
其父點編號為 [i/2] (無條件捨去取整數) (若 [i/2] < 1，則父節點不存在。)

4. Binary tree traversal

DFS : inorder(LVR), postorder(LRV), and preorder(VLR)
BFS : level-order traversal

[注意] 哪些組合可以唯一決定一棵 binary tree？

level-order和中序、前序和中序、後序和中序，這些組合可唯一決定 binary tree；而前序和後序不行唯一決定一棵 binary tree。

[感覺] DFS 遞迴演算法相關用法

Count : return (L+R+1)
Height : return (MAX(L, R)+1)
Swap : 將 root 的左右子樹互換。

5. Binary Search Tree

BST 常應用於 sort 與 search，使用方法如下：

建立：依據資料輸入的順序執行 insert 工作。
排序：先將資料建成二元搜尋樹，依 inorder 追蹤，即可得出排序結果。
搜尋：以 BST 搜尋時左右子樹愈平衡愈好。BST 搜尋時易受資料輸入順序影響。

BST 相關函式：

search() : O(h)
insertion() : O(h)
deletion() : O(h)，刪除較為複雜，分為三種情況，

葉節點，直接刪除即可。
只有一個小孩的節點，刪除該節點並用他的小孩替代他即可。
其餘的情況，刪除該節點並找前一個節點替代他(中序的順序)。

其他操作：ThreeWayJoin()、TwoWayJoin()、Split()。

BST 刪除一個節點 (3.)。

6. Thread Binary Tree

使用原因：

iteration 的中序走訪需要 stack。

規則：把所有節點中原本指向空的指標拿來用

左/右指標指向上/下一個中序順序節點。
最左和最右指向空。

資料結構：

節點結構為 [ Left Thread ][ Lchild ][ Data ][ Rchild ][ Right Thread ]
thread 為 true 時，表示 child 的指標為指向中序排列的 thread 指標。
使用時會再加入一個串列首。

實現 thread BT inorder traversal 中的 Next() function :

x—>rightThread == true, then successor of x is x—>rightChild by definition
x—>rightThread == false, then walk through left-child link from the right child of x until leftThread == true is reach.

7. Selection Trees

Motivation :

We have k ordered sequences(run), that are to be merged into a single ordered sequence.

Two types :

winner trees : a complete binary tree in which each node represents the smaller of its two children.

The root has the smallest.
height of the selection tree, O(log k).
if there are n records in k runs, merging can be done in O(nlogk)

loser trees : record loser rather than winner.

[感覺] loser tree 較直接，直接跟 parent 比！

winner tree.

loser tree.

三、Forest

Forest 化成 binary tree :

將 forest 中毎棵 tree 先化成 binary tree。
將各 binary tree 的 root 以 sibling 方式連結。
針對這些 roots 順時針轉 45 度。

Forest 的 traversal :

Forest 的 preorder、inorder等於「化成 BT 後再利用 BT 的 preorder、inorder traversal」。
Forest 的 postorder 不等於「化成 BT 後再利用 BT 的 postorder traversal」。

Forest 轉 binary tree。

四、Disjoint Sets

Data structure :

tree : the use of trees int the representation of sets
array : 每個元素用 parent 欄位存 parent 代碼，root 沒 parent 存 -1 或自己。

Operations :

Find(i). Find the set containing element i. trace the parent links to the root. worst case : O(n)
Union(U,V). Make one of the roots has the parent pointer point the root of the other tree. worst case : O(1)

Weighting rule for Union(U, V) :

加一個變數紀錄 node 數量，node 數量多的當 root。
防止隨便 Union 結果可能 tree 變成 linked list，造成 search 變慢。
此操作後 Find(i) 變快，為 O(logn)。

Collapsing rule :

Find() 時把經過的節點拆掉裝到 root 上。
Pay some more efforts this time, but the Find() operations in the future could save time.
Find takes O(α(p, q)) time，α(p, q) is a function which is the inverse of the the Ackermann’s function A(i, j).

[感覺] Ackermann's function 增加得比 $2^{2^{2^{...}}}$ 還快，所以其反函式增加超級慢。

Disjoint set 應用：

equivalence relationship
kruskal algorithm
檢驗 spanning tree 是否形成 cycle

References

待補

Pages

2017年9月7日星期四