Tree Based Indexing Techniques

Tree-based indexing, a multi-level indexing technique is an efficient way to store and retrieve data in databases, especially when dealing with large datasets. Two widely used tree-based structures in database indexing are B-trees and B+ trees. These structures ensure logarithmic time complexity for search, insertion, and deletion operations. This section covers the workings of these structures, their differences, and their use in databases.

B-tree

A B-tree is a self-balancing search tree that keeps data sorted and allows searches, sequential access, insertions, and deletions to be performed in $O(\log(n))$ time. Unlike binary search trees, B-trees allow nodes to have multiple keys and more than two children, ensuring the tree remains balanced.

Key Features

Data is sorted within each node in ascending order.
Nodes contain multiple keys and pointers to child nodes.
The left child of a key contains smaller keys, and the right child contains larger keys.
All leaf nodes are at the same depth.

Structure of a B-tree

Below is an example of a B-tree with three levels:

1734440624849195

Each node contains keys (e.g., 100, 155, 226) and pointers to child nodes.
The keys are arranged in ascending order, and child nodes split the data into ranges. For example:
- Keys less than 100 go to the left child.
- Keys between 100 and 155 go to the middle child.
- Keys greater than 155 go to the right child.

Why Use B-tree Indexing?

Imagine storing a list of numbers in an unsorted array. Searching for a value would require scanning every element in the worst case, resulting in $O(n)$ time complexity. Sorting the array and applying binary search improves the complexity to $O(\log(n))$, but maintaining the sorted order during insertions or deletions becomes inefficient.

How B-trees Improve Efficiency

They maintain sorted data dynamically, avoiding the overhead of re-sorting.
Searching is efficient due to the balanced tree structure.
Each node contains multiple keys, reducing the height of the tree and the number of disk accesses.

Example Search in a B-tree

Suppose you are searching for 117 in the above B-tree:

Start at the root node (100, 155, 226).
Since 117 is greater than 100 but less than 155, move to the middle child node (128, 140).
Within this node, move to the left child node (105, 117).
Locate the key 117.

This search involves only three levels, demonstrating the efficiency of (O(\log(n))) operations.

B+ Tree

A B+ tree is an extension of the B-tree. It stores all the data at the leaf nodes and keeps internal nodes for navigation only. Additionally, leaf nodes are linked, enabling efficient sequential access.

Key Features

Internal nodes contain keys and pointers to child nodes.
All data is stored in the leaf nodes.
Leaf nodes are linked, facilitating fast range queries.
Like B-trees, all leaf nodes are at the same depth, maintaining balance.

Structure of a B+ Tree

Below is an example of a B+ tree:

1734440691306481

Internal nodes (e.g., 13, 30, 9, 11, 16, 38) contain keys for navigation.
Leaf nodes (e.g., 1, 4, 9, 11, 12, 13, 16, 30, 38, 42) store all the data in sorted order.
Leaf nodes are connected via pointers, allowing efficient traversal for range queries.

Comparison of B-trees and B+ Trees

Feature	B-tree	B+ tree
Data Storage	Keys and data are stored in all nodes	Data is stored only in leaf nodes
Range Queries	Less efficient	Highly efficient due to linked leaf nodes
Traversal	Requires accessing multiple nodes	Sequential traversal via linked leaves

How Is Tree-Based Indexing Used in Databases?

Tree-based indexing, particularly B+ trees, is widely implemented in database systems to improve query performance.

1. Data Storage in B+ Trees

Databases store records in a table with multiple columns (e.g., Name, Marks, Age).
A unique index is created for each record (e.g., Index), which acts as a key in the B+ tree.

Example Table:

1734440749462967 Image scaled to 65%

2. Index Structure

Each key in the B+ tree references a record in the database.
Internal nodes facilitate navigation, while leaf nodes contain actual data or pointers to data pages.

Example B+ Tree for the Above Table:

1734440787913902

Keys (e.g., 12, 15, 24, 25) are used for indexing.
- Leaf nodes store the actual data (e.g., Jone: Marks=5, Age=28).

Advantages of Indexing with B+ Trees

Efficient Searches: Lookups take $O(\log(n))$ time for the index and sequentially access data in the leaf nodes.
Reduced Disk Accesses: The hierarchical structure minimizes the number of pages read from disk.
Faster Range Queries: Linked leaf nodes allow quick traversal of sorted data.

How It Works:

The database first searches the index in the B+ tree to find the corresponding key.
Once the key is located, the leaf node points to the actual data record in the table.

Tree-based indexing, particularly B+ trees, is fundamental to modern database systems. While B-trees offer efficient general-purpose indexing, B+ trees excel in scenarios requiring range queries and sequential data access. By leveraging these structures, databases can handle large datasets efficiently, ensuring fast and scalable performance.