Tree-based indexing, a multi-level indexing technique is an efficient way to store and retrieve data in databases, especially when dealing with large datasets. Two widely used tree-based structures in database indexing are B-trees and B+ trees. These structures ensure logarithmic time complexity for search, insertion, and deletion operations. This section covers the workings of these structures, their differences, and their use in databases.
B-tree
A B-tree is a self-balancing search tree that keeps data sorted and allows searches, sequential access, insertions, and deletions to be performed in $O(\log(n))$ time. Unlike binary search trees, B-trees allow nodes to have multiple keys and more than two children, ensuring the tree remains balanced.
Key Features
- Data is sorted within each node in ascending order.
- Nodes contain multiple keys and pointers to child nodes.
- The left child of a key contains smaller keys, and the right child contains larger keys.
- All leaf nodes are at the same depth.
Structure of a B-tree
Below is an example of a B-tree with three levels:
- Each node contains keys (e.g.,
100,155,226) and pointers to child nodes. - The keys are arranged in ascending order, and child nodes split the data into ranges. For example:
- Keys less than
100go to the left child. - Keys between
100and155go to the middle child. - Keys greater than
155go to the right child.
- Keys less than
Why Use B-tree Indexing?
Imagine storing a list of numbers in an unsorted array. Searching for a value would require scanning every element in the worst case, resulting in $O(n)$ time complexity. Sorting the array and applying binary search improves the complexity to $O(\log(n))$, but maintaining the sorted order during insertions or deletions becomes inefficient.
How B-trees Improve Efficiency
- They maintain sorted data dynamically, avoiding the overhead of re-sorting.
- Searching is efficient due to the balanced tree structure.
- Each node contains multiple keys, reducing the height of the tree and the number of disk accesses.
Example Search in a B-tree
Suppose you are searching for 117 in the above B-tree:
- Start at the root node (
100,155,226). - Since
117is greater than100but less than155, move to the middle child node (128,140). - Within this node, move to the left child node (
105,117). - Locate the key
117.
This search involves only three levels, demonstrating the efficiency of (O(\log(n))) operations.
B+ Tree
A B+ tree is an extension of the B-tree. It stores all the data at the leaf nodes and keeps internal nodes for navigation only. Additionally, leaf nodes are linked, enabling efficient sequential access.
Key Features
- Internal nodes contain keys and pointers to child nodes.
- All data is stored in the leaf nodes.
- Leaf nodes are linked, facilitating fast range queries.
- Like B-trees, all leaf nodes are at the same depth, maintaining balance.
Structure of a B+ Tree
Below is an example of a B+ tree:
- Internal nodes (e.g.,
13,30,9,11,16,38) contain keys for navigation. - Leaf nodes (e.g.,
1, 4, 9, 11, 12, 13, 16, 30, 38, 42) store all the data in sorted order. - Leaf nodes are connected via pointers, allowing efficient traversal for range queries.
Comparison of B-trees and B+ Trees
| Feature | B-tree | B+ tree |
|---|---|---|
| Data Storage | Keys and data are stored in all nodes | Data is stored only in leaf nodes |
| Range Queries | Less efficient | Highly efficient due to linked leaf nodes |
| Traversal | Requires accessing multiple nodes | Sequential traversal via linked leaves |
How Is Tree-Based Indexing Used in Databases?
Tree-based indexing, particularly B+ trees, is widely implemented in database systems to improve query performance.
1. Data Storage in B+ Trees
- Databases store records in a table with multiple columns (e.g.,
Name,Marks,Age). - A unique index is created for each record (e.g.,
Index), which acts as a key in the B+ tree.
Example Table:
Image scaled to 65%
2. Index Structure
- Each key in the B+ tree references a record in the database.
- Internal nodes facilitate navigation, while leaf nodes contain actual data or pointers to data pages.
Example B+ Tree for the Above Table:
- Keys (e.g.,
12,15,24,25) are used for indexing.- Leaf nodes store the actual data (e.g.,
Jone: Marks=5, Age=28).
- Leaf nodes store the actual data (e.g.,
Advantages of Indexing with B+ Trees
- Efficient Searches: Lookups take $O(\log(n))$ time for the index and sequentially access data in the leaf nodes.
- Reduced Disk Accesses: The hierarchical structure minimizes the number of pages read from disk.
- Faster Range Queries: Linked leaf nodes allow quick traversal of sorted data.
How It Works:
- The database first searches the index in the B+ tree to find the corresponding key.
- Once the key is located, the leaf node points to the actual data record in the table.
Tree-based indexing, particularly B+ trees, is fundamental to modern database systems. While B-trees offer efficient general-purpose indexing, B+ trees excel in scenarios requiring range queries and sequential data access. By leveraging these structures, databases can handle large datasets efficiently, ensuring fast and scalable performance.