In the previous blog, we have discussed some basics about string, now we will continue from the same.
Pattern Searching: Rabin-Karp Algorithm
We have to write a function that prints all the occurrences of patrn in txt.
Assume that length of txt is greater than patrn.
Examples:
Input: txt[] = "IT IS A TOAST TEST"
pat[] = "TOAST"
Output: Pattern found at index 8
Input: txt[] = "CAACABAACAAB"
pat[] = "CAA"
Output: Pattern found at index 0
Pattern found at index 8
Like the Naive Algorithm, the Rabin-Karp algorithm also slides the pattern one by one.
But unlike the Naive algorithm, Rabin Karp algorithm matches the hash value of the pattern with the hash value of current substring of text, and if the hash values match then only it starts matching individual characters. So Rabin Karp algorithm must calculate hash values for the following strings.
Pattern itself.
All the substrings of the text of length m, that's of the length of the pattern string.
Since we'd like to efficiently calculate hash values for all the substrings of size m of text, we must have a hash function which has the subsequent property.
The hash function suggested by Rabin and Karp calculates an integer value. The integer value for a string is that the numeric value of a string. For instance, if all possible characters are from 1 to 10, the numeric value of "132" is going to be 132. The amount of possible characters is above 10 (256 in general) and pattern length are often large. Therefore the numeric values can't be practically stored as an integer.
Therefore, the numeric value is calculated using modular arithmetic to form sure that the hash values are often stored in an integer variable (can slot in memory words). to do rehashing, we'd like to take off the most significant digit and add the new least significant digit for in hash value. Rehashing is completed using the subsequent formula.
hash( txt[s+1 .. s+m] ) = ( d ( hash( txt[s .. s+m-1]) - txt[s] * h ) + txt[s + m] ) mod pr
Here,
hash( txt[s ... s + m-1] ): Hash value at shifts
hash( txt[s+1 .. s+m] ): Hash value at next shift (or shifts+1)
d: Number of characters in the alphabet
pr: A prime number
h: d^(m-1)
Implementation of Rabin Karp Algorithm Using C/C++:
#include<stdio.h>
#include<string.h>
// d is the number of characters
// in the input alphabet
#define d 256
/* patrn -> pattern
txt -> text
pr -> A prime number
*/
void search(char patrn[], char txt[], int pr)
{
int M = strlen(patrn);
int N = strlen(txt);
int i, j;
int p = 0; // hash value for pattern
int t = 0; // hash value for txt
int h = 1;
// The value of h would be "pow(d, M-1) % pr"
for (i = 0; i < M-1; i++)
h = (h*d) % pr;
for (i = 0; i < M; i++)
{
p = (d*p + patrn[i]) % pr;
t = (d*t + txt[i]) % pr;
}
// Slide the pattern over text one by one
for (i = 0; i <= N - M; i++)
{
/* Check the hash values of current window of text
and pattern. If the hash values match then only
check for characters on by one */
if ( p == t )
{
//Check for characters one by one
for (j = 0; j < M; j++)
{
if (txt[i+j] != patrn[j])
break;
}
//if p == t & patrn[0...M-1]= txt[i, i+1, ...i+M-1]
if (j == M)
printf("Pattern found at index %d \n", i);
}
//Remove leading digit, add trailing digit
if ( i < N-M )
{
t = (d*(t - txt[i]*h) + txt[i+M]) % pr;
// converting negative value of t to positive
if (t < 0)
t = (t + pr);
}
}
}
int main()
{
char txt[] = "HELLO I AM A GEEK";
char patrn[] = "GEEK";
int pr = 101; // A prime number
search(patrn, txt, pr);
return 0;
}
Output:
Pattern found at index 13
Time Complexity:
Average and best-case: O(n+m)
Worst-case: O(nm).
The worst case of the Rabin-Karp algorithm occurs when all characters of pattern and text are the same as the hash values of all the substrings of txt match with the hash value of patrn. For example patrn = "BBB" and txt = "BBBBBBB".
Problems
Most asked problems on a string:
Reverse words in a string
Maximum Occurring Character
Longest Palindrome Substring
Anagram
Implement strstr
Check if strings are rotations of each other or not
Happy Coding!
Follow us on Instagram @programmersdoor
Join us on Telegram @programmersdoor
Please write comments if you find any bug in the above code/algorithm, or find other ways to solve the same problem.
Follow Programmers Door for more.
Comments