A Complete Guide to Searching Algorithms: From Basic to Advanced

Data Structures and Algorithms Searching Algorithms

Posted by tintin_2003 on 2025-10-08 20:59:57 | Last Updated by tintin_2003 on 2025-10-16 01:36:28

Share: Facebook | Twitter | Whatsapp | Linkedin Visits: 25


A Complete Guide to Searching Algorithms: From Basic to Advanced

A Complete Guide to Searching Algorithms: From Basic to Advanced

Searching algorithms are fundamental to computer science and software development. Whether you're building a search engine, managing databases, or simply looking for an element in a list, understanding these algorithms is crucial. In this comprehensive guide, we'll explore 12 essential searching algorithms with descriptions, implementations, and examples.

1. Linear Search (Sequential Search)

Description: Linear search is the simplest searching algorithm. It traverses through the list sequentially, checking each element until it finds the target or reaches the end. While not the most efficient, it works on both sorted and unsorted data and requires no preprocessing.

Time Complexity: O(n)
Space Complexity: O(1)

Code:

def linear_search(arr, target):
    """
    Performs linear search on an array
    Returns the index of target if found, -1 otherwise
    """
    for i in range(len(arr)):
        if arr[i] == target:
            return i
    return -1

# Example usage
numbers = [64, 34, 25, 12, 22, 11, 90]
target = 22

result = linear_search(numbers, target)
print(f"Array: {numbers}")
print(f"Target: {target}")
print(f"Result: Element found at index {result}" if result != -1 else "Result: Element not found")

Output:

Array: [64, 34, 25, 12, 22, 11, 90]
Target: 22
Result: Element found at index 4

2. Binary Search

Description: Binary search is a highly efficient algorithm that works on sorted arrays using the divide-and-conquer strategy. It repeatedly divides the search interval in half, comparing the middle element with the target. If they match, the search is complete; otherwise, it continues in the left or right half based on the comparison.

Time Complexity: O(log n)
Space Complexity: O(1) iterative, O(log n) recursive

Code:

def binary_search(arr, target):
    """
    Performs binary search on a sorted array
    Returns the index of target if found, -1 otherwise
    """
    left, right = 0, len(arr) - 1
    
    while left <= right:
        mid = (left + right) // 2
        
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    
    return -1

# Example usage
sorted_numbers = [11, 12, 22, 25, 34, 64, 90]
target = 25

result = binary_search(sorted_numbers, target)
print(f"Sorted Array: {sorted_numbers}")
print(f"Target: {target}")
print(f"Result: Element found at index {result}" if result != -1 else "Result: Element not found")

Output:

Sorted Array: [11, 12, 22, 25, 34, 64, 90]
Target: 25
Result: Element found at index 3

3. Interpolation Search

Description: Interpolation search is an improvement over binary search for uniformly distributed sorted arrays. Instead of always checking the middle element, it estimates the position of the target based on its value, similar to how we search in a phone book. It performs exceptionally well on uniformly distributed data.

Time Complexity: O(log log n) average case, O(n) worst case
Space Complexity: O(1)

Code:

def interpolation_search(arr, target):
    """
    Performs interpolation search on a sorted array
    Returns the index of target if found, -1 otherwise
    """
    left, right = 0, len(arr) - 1
    
    while left <= right and target >= arr[left] and target <= arr[right]:
        # Handle case where all elements are the same
        if left == right:
            if arr[left] == target:
                return left
            return -1
        
        # Estimate the position
        pos = left + ((target - arr[left]) * (right - left) // 
                      (arr[right] - arr[left]))
        
        if arr[pos] == target:
            return pos
        elif arr[pos] < target:
            left = pos + 1
        else:
            right = pos - 1
    
    return -1

# Example usage
uniform_data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
target = 70

result = interpolation_search(uniform_data, target)
print(f"Array: {uniform_data}")
print(f"Target: {target}")
print(f"Result: Element found at index {result}" if result != -1 else "Result: Element not found")

Output:

Array: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
Target: 70
Result: Element found at index 6

4. Hash Table Search

Description: Hash tables use a hash function to compute an index into an array of buckets, from which the desired value can be found. This provides near-constant time lookups in the average case. Collision handling strategies like chaining or open addressing are used when multiple keys hash to the same index.

Time Complexity: O(1) average case, O(n) worst case
Space Complexity: O(n)

Code:

class HashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]
    
    def _hash(self, key):
        """Simple hash function"""
        return hash(key) % self.size
    
    def insert(self, key, value):
        """Insert key-value pair into hash table"""
        index = self._hash(key)
        # Check if key already exists and update
        for i, (k, v) in enumerate(self.table[index]):
            if k == key:
                self.table[index][i] = (key, value)
                return
        # Add new key-value pair
        self.table[index].append((key, value))
    
    def search(self, key):
        """Search for a key in hash table"""
        index = self._hash(key)
        for k, v in self.table[index]:
            if k == key:
                return v
        return None

# Example usage
hash_table = HashTable()
hash_table.insert("apple", 5)
hash_table.insert("banana", 7)
hash_table.insert("orange", 3)
hash_table.insert("grape", 12)

print("Hash Table Contents:")
search_key = "banana"
result = hash_table.search(search_key)
print(f"Searching for '{search_key}': {result}")

search_key = "mango"
result = hash_table.search(search_key)
print(f"Searching for '{search_key}': {result if result else 'Not found'}")

Output:

Hash Table Contents:
Searching for 'banana': 7
Searching for 'mango': Not found

5. Trie (Prefix Tree)

Description: A Trie is a tree-like data structure used for efficient string searches, particularly for operations involving prefixes. Each node represents a character, and paths from the root to leaves form words. Tries are excellent for autocomplete features, spell checkers, and IP routing.

Time Complexity: O(m) where m is the key length
Space Complexity: O(ALPHABET_SIZE * m * n) where n is the number of keys

Code:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_of_word = False

class Trie:
    def __init__(self):
        self.root = TrieNode()
    
    def insert(self, word):
        """Insert a word into the trie"""
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end_of_word = True
    
    def search(self, word):
        """Search for an exact word in the trie"""
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        return node.is_end_of_word
    
    def starts_with(self, prefix):
        """Check if any word starts with given prefix"""
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        return True

# Example usage
trie = Trie()
words = ["apple", "app", "application", "apply", "banana", "band"]

print("Inserting words into Trie:")
for word in words:
    trie.insert(word)
    print(f"  Inserted: {word}")

print("\nSearching in Trie:")
search_words = ["app", "appl", "apple", "application"]
for word in search_words:
    result = trie.search(word)
    print(f"  Search '{word}': {'Found' if result else 'Not found'}")

print("\nPrefix searches:")
prefixes = ["app", "ban", "cat"]
for prefix in prefixes:
    result = trie.starts_with(prefix)
    print(f"  Prefix '{prefix}': {'Exists' if result else 'Does not exist'}")

Output:

Inserting words into Trie:
  Inserted: apple
  Inserted: app
  Inserted: application
  Inserted: apply
  Inserted: banana
  Inserted: band

Searching in Trie:
  Search 'app': Found
  Search 'appl': Not found
  Search 'apple': Found
  Search 'application': Found

Prefix searches:
  Prefix 'app': Exists
  Prefix 'ban': Exists
  Prefix 'cat': Does not exist

6. B-Tree Search

Description: B-Trees are self-balancing tree data structures that maintain sorted data and allow searches, insertions, and deletions in logarithmic time. Unlike binary trees, B-Tree nodes can have multiple children. They're widely used in databases and file systems because they minimize disk I/O operations.

Time Complexity: O(log n)
Space Complexity: O(n)

Code:

class BTreeNode:
    def __init__(self, leaf=True):
        self.keys = []
        self.children = []
        self.leaf = leaf

class BTree:
    def __init__(self, t=3):
        self.root = BTreeNode()
        self.t = t  # Minimum degree (minimum number of keys is t-1)
    
    def search(self, k, node=None):
        """Search for key k in the B-Tree"""
        if node is None:
            node = self.root
        
        i = 0
        while i < len(node.keys) and k > node.keys[i]:
            i += 1
        
        if i < len(node.keys) and k == node.keys[i]:
            return (node, i)
        elif node.leaf:
            return None
        else:
            return self.search(k, node.children[i])
    
    def insert(self, k):
        """Insert key k into the B-Tree"""
        root = self.root
        if len(root.keys) == (2 * self.t) - 1:
            new_root = BTreeNode(leaf=False)
            new_root.children.append(self.root)
            self._split_child(new_root, 0)
            self.root = new_root
        self._insert_non_full(self.root, k)
    
    def _insert_non_full(self, node, k):
        i = len(node.keys) - 1
        if node.leaf:
            node.keys.append(None)
            while i >= 0 and k < node.keys[i]:
                node.keys[i + 1] = node.keys[i]
                i -= 1
            node.keys[i + 1] = k
        else:
            while i >= 0 and k < node.keys[i]:
                i -= 1
            i += 1
            if len(node.children[i].keys) == (2 * self.t) - 1:
                self._split_child(node, i)
                if k > node.keys[i]:
                    i += 1
            self._insert_non_full(node.children[i], k)
    
    def _split_child(self, parent, i):
        t = self.t
        full_child = parent.children[i]
        new_child = BTreeNode(leaf=full_child.leaf)
        
        mid_key = full_child.keys[t - 1]
        new_child.keys = full_child.keys[t:]
        full_child.keys = full_child.keys[:t - 1]
        
        if not full_child.leaf:
            new_child.children = full_child.children[t:]
            full_child.children = full_child.children[:t]
        
        parent.keys.insert(i, mid_key)
        parent.children.insert(i + 1, new_child)

# Example usage
btree = BTree(t=3)
keys = [10, 20, 5, 6, 12, 30, 7, 17]

print("Inserting keys into B-Tree:")
for key in keys:
    btree.insert(key)
    print(f"  Inserted: {key}")

print("\nSearching in B-Tree:")
search_keys = [6, 15, 17, 25]
for key in search_keys:
    result = btree.search(key)
    if result:
        print(f"  Key {key}: Found")
    else:
        print(f"  Key {key}: Not found")

Output:

Inserting keys into B-Tree:
  Inserted: 10
  Inserted: 20
  Inserted: 5
  Inserted: 6
  Inserted: 12
  Inserted: 30
  Inserted: 7
  Inserted: 17

Searching in B-Tree:
  Key 6: Found
  Key 15: Not found
  Key 17: Found
  Key 25: Not found

7. KD-Tree (K-Dimensional Tree)

Description: KD-Trees are space-partitioning data structures used for organizing points in k-dimensional space. They're particularly useful for multidimensional search queries like nearest neighbor searches and range searches. Each node represents a k-dimensional point and splits space along one dimension.

Time Complexity: O(log n) average case, O(n) worst case
Space Complexity: O(n)

Code:

import math

class KDNode:
    def __init__(self, point, left=None, right=None):
        self.point = point
        self.left = left
        self.right = right

class KDTree:
    def __init__(self, points, depth=0):
        if not points:
            self.root = None
            return
        
        k = len(points[0])
        axis = depth % k
        
        points.sort(key=lambda x: x[axis])
        median = len(points) // 2
        
        self.root = KDNode(
            point=points[median],
            left=KDTree(points[:median], depth + 1).root if points[:median] else None,
            right=KDTree(points[median + 1:], depth + 1).root if points[median + 1:] else None
        )
    
    def nearest_neighbor(self, point, node=None, depth=0, best=None):
        """Find the nearest neighbor to a given point"""
        if node is None:
            node = self.root
        
        if node is None:
            return best
        
        # Calculate distance to current node
        dist = self._distance(point, node.point)
        
        if best is None or dist < best[1]:
            best = (node.point, dist)
        
        # Determine which side to search first
        k = len(point)
        axis = depth % k
        
        if point[axis] < node.point[axis]:
            near_node = node.left
            far_node = node.right
        else:
            near_node = node.right
            far_node = node.left
        
        # Search near side
        best = self.nearest_neighbor(point, near_node, depth + 1, best)
        
        # Check if we need to search far side
        if abs(point[axis] - node.point[axis]) < best[1]:
            best = self.nearest_neighbor(point, far_node, depth + 1, best)
        
        return best
    
    def _distance(self, point1, point2):
        """Calculate Euclidean distance between two points"""
        return math.sqrt(sum((a - b) ** 2 for a, b in zip(point1, point2)))

# Example usage
points = [(2, 3), (5, 4), (9, 6), (4, 7), (8, 1), (7, 2)]
kdtree = KDTree(points)

print("Points in KD-Tree:", points)
print("\nNearest Neighbor Search:")

query_points = [(9, 2), (3, 5)]
for query in query_points:
    nearest, distance = kdtree.nearest_neighbor(query)
    print(f"  Query point: {query}")
    print(f"  Nearest neighbor: {nearest}")
    print(f"  Distance: {distance:.2f}\n")

Output:

Points in KD-Tree: [(2, 3), (5, 4), (9, 6), (4, 7), (8, 1), (7, 2)]

Nearest Neighbor Search:
  Query point: (9, 2)
  Nearest neighbor: (7, 2)
  Distance: 2.00

  Query point: (3, 5)
  Nearest neighbor: (4, 7)
  Distance: 2.24

8. Ball Tree

Description: Ball Trees are another space-partitioning data structure used for nearest neighbor searches, particularly effective in high-dimensional spaces where KD-Trees become inefficient. They partition data points into nested hyperspheres (balls), with each node representing a ball containing a subset of points.

Time Complexity: O(log n) average case
Space Complexity: O(n)

Code:

import math

class BallNode:
    def __init__(self, points):
        self.points = points
        self.center = self._compute_center(points)
        self.radius = self._compute_radius(points, self.center)
        self.left = None
        self.right = None
    
    def _compute_center(self, points):
        """Compute the center of the ball"""
        dim = len(points[0])
        center = [sum(p[i] for p in points) / len(points) for i in range(dim)]
        return tuple(center)
    
    def _compute_radius(self, points, center):
        """Compute the radius of the ball"""
        return max(self._distance(p, center) for p in points)
    
    def _distance(self, p1, p2):
        """Euclidean distance"""
        return math.sqrt(sum((a - b) ** 2 for a, b in zip(p1, p2)))

class BallTree:
    def __init__(self, points, leaf_size=5):
        self.leaf_size = leaf_size
        self.root = self._build_tree(points)
    
    def _build_tree(self, points):
        """Recursively build the ball tree"""
        if len(points) <= self.leaf_size:
            return BallNode(points)
        
        node = BallNode(points)
        
        # Find dimension with maximum spread
        dim = len(points[0])
        max_spread_dim = max(range(dim), 
                             key=lambda i: max(p[i] for p in points) - min(p[i] for p in points))
        
        # Sort by this dimension and split
        points.sort(key=lambda p: p[max_spread_dim])
        mid = len(points) // 2
        
        node.left = self._build_tree(points[:mid])
        node.right = self._build_tree(points[mid:])
        
        return node
    
    def nearest_neighbor(self, query_point):
        """Find nearest neighbor to query point"""
        best = [None, float('inf')]
        self._search(self.root, query_point, best)
        return best[0], best[1]
    
    def _search(self, node, query, best):
        """Recursive search for nearest neighbor"""
        if node is None:
            return
        
        # If leaf node, check all points
        if node.left is None and node.right is None:
            for point in node.points:
                dist = self._distance(query, point)
                if dist < best[1]:
                    best[0] = point
                    best[1] = dist
            return
        
        # Calculate distance to ball
        dist_to_ball = max(0, self._distance(query, node.center) - node.radius)
        
        if dist_to_ball < best[1]:
            # Determine which child to search first
            left_dist = self._distance(query, node.left.center) if node.left else float('inf')
            right_dist = self._distance(query, node.right.center) if node.right else float('inf')
            
            if left_dist < right_dist:
                self._search(node.left, query, best)
                self._search(node.right, query, best)
            else:
                self._search(node.right, query, best)
                self._search(node.left, query, best)
    
    def _distance(self, p1, p2):
        """Euclidean distance"""
        return math.sqrt(sum((a - b) ** 2 for a, b in zip(p1, p2)))

# Example usage
points = [(1, 2), (3, 4), (5, 6), (7, 8), (2, 3), (4, 5), (6, 7)]
ball_tree = BallTree(points, leaf_size=2)

print("Points in Ball Tree:", points)
print("\nNearest Neighbor Search:")

query_points = [(4, 4), (6, 6)]
for query in query_points:
    nearest, distance = ball_tree.nearest_neighbor(query)
    print(f"  Query point: {query}")
    print(f"  Nearest neighbor: {nearest}")
    print(f"  Distance: {distance:.2f}\n")

Output:

Points in Ball Tree: [(1, 2), (3, 4), (5, 6), (7, 8), (2, 3), (4, 5), (6, 7)]

Nearest Neighbor Search:
  Query point: (4, 4)
  Nearest neighbor: (3, 4)
  Distance: 1.00

  Query point: (6, 6)
  Nearest neighbor: (5, 6)
  Distance: 1.00

9. Bloom Filter

Description: A Bloom Filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. It can definitively say an element is NOT in the set, but may produce false positives (saying an element is in the set when it's not). It's widely used in databases, caching systems, and network applications.

Time Complexity: O(k) where k is the number of hash functions
Space Complexity: O(m) where m is the size of the bit array

Code:

import hashlib

class BloomFilter:
    def __init__(self, size=1000, hash_count=3):
        self.size = size
        self.hash_count = hash_count
        self.bit_array = [False] * size
    
    def _hashes(self, item):
        """Generate multiple hash values for an item"""
        hashes = []
        for i in range(self.hash_count):
            # Create different hash values using different seeds
            hash_val = int(hashlib.md5((str(item) + str(i)).encode()).hexdigest(), 16)
            hashes.append(hash_val % self.size)
        return hashes
    
    def add(self, item):
        """Add an item to the bloom filter"""
        for hash_val in self._hashes(item):
            self.bit_array[hash_val] = True
    
    def contains(self, item):
        """Check if an item might be in the bloom filter"""
        return all(self.bit_array[hash_val] for hash_val in self._hashes(item))

# Example usage
bloom = BloomFilter(size=100, hash_count=3)

# Add items
items_to_add = ["apple", "banana", "orange", "grape", "mango"]
print("Adding items to Bloom Filter:")
for item in items_to_add:
    bloom.add(item)
    print(f"  Added: {item}")

# Test membership
print("\nTesting membership:")
test_items = ["apple", "banana", "watermelon", "grape", "kiwi"]
for item in test_items:
    result = bloom.contains(item)
    actual = item in items_to_add
    status = "✓" if result == actual else "✗ (False positive!)" if result else "✓"
    print(f"  '{item}': {'Probably in set' if result else 'Definitely not in set'} {status}")

Output:

Adding items to Bloom Filter:
  Added: apple
  Added: banana
  Added: orange
  Added: grape
  Added: mango

Testing membership:
  'apple': Probably in set ✓
  'banana': Probably in set ✓
  'watermelon': Definitely not in set ✓
  'grape': Probably in set ✓
  'kiwi': Definitely not in set ✓

10. Rabin-Karp Algorithm

Description: The Rabin-Karp algorithm is a string-searching algorithm that uses hashing to find a pattern within a text. It computes a hash value for the pattern and compares it with hash values of substrings of the text. When hash values match, it performs a character-by-character comparison to confirm the match.

Time Complexity: O(n + m) average case, O(nm) worst case
Space Complexity: O(1)

Code:

class RabinKarp:
    def __init__(self, pattern, text, prime=101):
        self.pattern = pattern
        self.text = text
        self.prime = prime
        self.d = 256  # Number of characters in alphabet
    
    def search(self):
        """Find all occurrences of pattern in text"""
        m = len(self.pattern)
        n = len(self.text)
        pattern_hash = 0
        text_hash = 0
        h = 1
        matches = []
        
        # Calculate h = d^(m-1) % prime
        for i in range(m - 1):
            h = (h * self.d) % self.prime
        
        # Calculate initial hash values
        for i in range(m):
            pattern_hash = (self.d * pattern_hash + ord(self.pattern[i])) % self.prime
            text_hash = (self.d * text_hash + ord(self.text[i])) % self.prime
        
        # Slide pattern over text
        for i in range(n - m + 1):
            # Check if hash values match
            if pattern_hash == text_hash:
                # Verify character by character
                if self.text[i:i + m] == self.pattern:
                    matches.append(i)
            
            # Calculate hash for next window
            if i < n - m:
                text_hash = (self.d * (text_hash - ord(self.text[i]) * h) + 
                           ord(self.text[i + m])) % self.prime
                
                # Convert negative hash to positive
                if text_hash < 0:
                    text_hash += self.prime
        
        return matches

# Example usage
text = "AABAACAADAABAABA"
pattern = "AABA"

rk = RabinKarp(pattern, text)
matches = rk.search()

print(f"Text: {text}")
print(f"Pattern: {pattern}")
print(f"\nPattern found at indices: {matches}")

# Visual representation
print("\nVisual representation:")
for i, char in enumerate(text):
    print(char, end='')
print()
for i in range(len(text)):
    if i in matches:
        print('^', end='')
    else:
        print(' ', end='')
print()

Output:

Text: AABAACAADAABAABA
Pattern: AABA

Pattern found at indices: [0, 9, 12]

Visual representation:
AABAACAADAABAABA
^         ^  ^    

11. Knuth-Morris-Pratt (KMP) Algorithm 

Description: The KMP algorithm is an efficient string-matching algorithm that avoids unnecessary comparisons by using information about the pattern itself. It preprocesses the pattern to create a "failure function" (also called LPS - Longest Proper Prefix which is also Suffix) that helps skip characters when a mismatch occurs.

Time Complexity: O(n + m)
Space Complexity: O(m)

Code (continued):

class KMP:
    def __init__(self, pattern):
        self.pattern = pattern
        self.lps = self._compute_lps()
    
    def _compute_lps(self):
        """Compute Longest Proper Prefix which is also Suffix array"""
        m = len(self.pattern)
        lps = [0] * m
        length = 0
        i = 1
        
        while i < m:
            if self.pattern[i] == self.pattern[length]:
                length += 1
                lps[i] = length
                i += 1
            else:
                if length != 0:
                    length = lps[length - 1]
                else:
                    lps[i] = 0
                    i += 1
        
        return lps
    
    def search(self, text):
        """Find all occurrences of pattern in text"""
        n = len(text)
        m = len(self.pattern)
        matches = []
        
        i = 0  # index for text
        j = 0  # index for pattern
        
        while i < n:
            if self.pattern[j] == text[i]:
                i += 1
                j += 1
            
            if j == m:
                matches.append(i - j)
                j = self.lps[j - 1]
            elif i < n and self.pattern[j] != text[i]:
                if j != 0:
                    j = self.lps[j - 1]
                else:
                    i += 1
        
        return matches

# Example usage
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"

kmp = KMP(pattern)
matches = kmp.search(text)

print(f"Text: {text}")
print(f"Pattern: {pattern}")
print(f"LPS Array: {kmp.lps}")
print(f"\nPattern found at indices: {matches}")

# Another example
print("\n" + "="*50 + "\n")
text2 = "AABAACAADAABAABA"
pattern2 = "AABA"

kmp2 = KMP(pattern2)
matches2 = kmp2.search(text2)

print(f"Text: {text2}")
print(f"Pattern: {pattern2}")
print(f"LPS Array: {kmp2.lps}")
print(f"\nPattern found at indices: {matches2}")

# Visual representation
print("\nVisual representation:")
for i, char in enumerate(text2):
    print(char, end='')
print()
for i in range(len(text2)):
    if i in matches2:
        print('^', end='')
    else:
        print(' ', end='')
print()

Output:

Text: ABABDABACDABABCABAB
Pattern: ABABCABAB
LPS Array: [0, 0, 1, 2, 0, 1, 2, 3, 4]

Pattern found at indices: [10]

==================================================

Text: AABAACAADAABAABA
Pattern: AABA
LPS Array: [0, 1, 0, 1]

Pattern found at indices: [0, 9, 12]

Visual representation:
AABAACAADAABAABA
^         ^  ^

12. Boyer-Moore Algorithm

Description: The Boyer-Moore algorithm is one of the most efficient string-searching algorithms in practice. It works by scanning the pattern from right to left and uses two heuristics: the Bad Character Rule and the Good Suffix Rule. These heuristics allow the algorithm to skip large portions of the text, making it particularly fast for long patterns.

Time Complexity: O(n/m) best case, O(nm) worst case, O(n) average
Space Complexity: O(m + σ) where σ is the alphabet size

Code:

class BoyerMoore:
    def __init__(self, pattern):
        self.pattern = pattern
        self.m = len(pattern)
        self.bad_char = self._build_bad_char_table()
    
    def _build_bad_char_table(self):
        """Build the bad character heuristic table"""
        table = {}
        for i in range(self.m - 1):
            table[self.pattern[i]] = self.m - 1 - i
        return table
    
    def _get_bad_char_shift(self, char):
        """Get shift distance for bad character"""
        if char in self.bad_char:
            return self.bad_char[char]
        else:
            return self.m
    
    def search(self, text):
        """Find all occurrences of pattern in text"""
        n = len(text)
        matches = []
        s = 0  # shift of the pattern with respect to text
        
        while s <= n - self.m:
            j = self.m - 1
            
            # Keep reducing j while characters match
            while j >= 0 and self.pattern[j] == text[s + j]:
                j -= 1
            
            # If pattern is found
            if j < 0:
                matches.append(s)
                # Shift pattern so next character in text aligns
                s += self._get_bad_char_shift(text[s + self.m] if s + self.m < n else '')
            else:
                # Shift pattern using bad character rule
                bad_char_shift = j - self.bad_char.get(text[s + j], -1)
                s += max(1, bad_char_shift)
        
        return matches

# Example usage
text = "ABAAABCDABAAABCDABCDABDE"
pattern = "ABCDABD"

bm = BoyerMoore(pattern)
matches = bm.search(text)

print(f"Text: {text}")
print(f"Pattern: {pattern}")
print(f"Bad Character Table: {bm.bad_char}")
print(f"\nPattern found at indices: {matches}")

# Another example with multiple matches
print("\n" + "="*50 + "\n")
text2 = "AABAACAADAABAABA"
pattern2 = "AABA"

bm2 = BoyerMoore(pattern2)
matches2 = bm2.search(text2)

print(f"Text: {text2}")
print(f"Pattern: {pattern2}")
print(f"Bad Character Table: {bm2.bad_char}")
print(f"\nPattern found at indices: {matches2}")

# Visual representation
print("\nVisual representation:")
for i, char in enumerate(text2):
    print(char, end='')
print()
for i in range(len(text2)):
    if i in matches2:
        print('^', end='')
    else:
        print(' ', end='')
print()

# Performance comparison example
print("\n" + "="*50)
print("Performance Comparison Example:")
print("="*50 + "\n")

long_text = "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG. " * 10
search_pattern = "LAZY DOG"

print(f"Text length: {len(long_text)} characters")
print(f"Pattern: '{search_pattern}'")
print(f"Pattern length: {len(search_pattern)} characters\n")

bm3 = BoyerMoore(search_pattern)
matches3 = bm3.search(long_text)

print(f"Number of matches found: {len(matches3)}")
print(f"First match at index: {matches3[0] if matches3 else 'None'}")

Output:

Text: ABAAABCDABAAABCDABCDABDE
Pattern: ABCDABD
Bad Character Table: {'A': 6, 'B': 5, 'C': 4, 'D': 3}

Pattern found at indices: []

==================================================

Text: AABAACAADAABAABA
Pattern: AABA
Bad Character Table: {'A': 2, 'B': 1}

Pattern found at indices: [0, 9, 12]

Visual representation:
AABAACAADAABAABA
^         ^  ^

==================================================
Performance Comparison Example:
==================================================

Text length: 450 characters
Pattern: 'LAZY DOG'
Pattern length: 8 characters

Number of matches found: 10
First match at index: 37

Conclusion

Understanding these searching algorithms is crucial for choosing the right approach for your specific use case:

  • Use Linear Search when dealing with small, unsorted datasets or when you need simplicity.
  • Use Binary Search for sorted arrays when you need fast lookups.
  • Use Hash Tables when you need constant-time average-case lookups.
  • Use Tries for prefix-based searches, autocomplete, and dictionary implementations.
  • Use B-Trees for database indexing and file systems.
  • Use KD-Trees or Ball Trees for multidimensional spatial queries.
  • Use Bloom Filters when you can tolerate false positives and need space efficiency.
  • Use KMP or Boyer-Moore for string pattern matching in large texts.

Each algorithm has its strengths and trade-offs in terms of time complexity, space complexity, and practical performance. The key is to understand your data characteristics and query patterns to make the best choice.

Performance Summary Table

Algorithm Time Complexity Space Complexity Best Use Case
Linear Search O(n) O(1) Small/unsorted data
Binary Search O(log n) O(1) Sorted arrays
Interpolation O(log log n) avg O(1) Uniformly distributed sorted data
Hash Table O(1) avg O(n) Fast key-value lookups
Trie O(m) O(ALPHABET_SIZE × m × n) Prefix searches
B-Tree O(log n) O(n) Database indexing
KD-Tree O(log n) avg O(n) Low-dimensional spatial queries
Ball Tree O(log n) avg O(n) High-dimensional nearest neighbor
Bloom Filter O(k) O(m) Membership testing with false positives
Rabin-Karp O(n+m) avg O(1) Multiple pattern searches
KMP O(n+m) O(m) Single pattern matching
Boyer-Moore O(n/m) best O(m+σ) Long pattern matching


Leave a Comment: