Friday, June 5, 2015

Shortest Palindrome and KMP algorithm: A little thought!

In leetcode, Shortest Palindrome is one of the site's interesting algorithmic problems. It states that from a string s find the shortest palindrome by adding some characters to the front of s.

If you have never tried to solve this problem, I suggest that you solve it, and it will help you improve your problem solving skill.

After solving it, I kept looking for better solutions. I stumbled upon another programmer's solution. It is in python, and really neat. It is really interesting, but later I found out it was wrong.
class Solution:
    # @param {string} s
    # @return {string}
    def shortestPalindrome(self, s):
        A=s+s[::-1]
        cont=[0]
        for i in range(1,len(A)):
            index=cont[i-1]
            while(index>0 and A[index]!=A[i]):
                index=cont[index-1]
            cont.append(index+(1 if A[index]==A[i] else 0))
        print cont[-1]
        return s[cont[-1]:][::-1]+s

If you know python, please take some time to digest the Solution. By 2015-06-05, this solution is still accepted by leetcode. (Updated: LeetCode now added test cases to check this issue)

I myself looked at the Solution and saw it's interesting idea. At first, the algorithm concatenates the string and its reversed version. Then the following steps are similar to the steps for building KMP-table (or failure function) using in KMP algorithm. Why does this procedure work?

If you know KMP text searching algorithm, you will know its "lookup table" and steps to build it. Right now, I just show one important use of the table: it can show you the longest prefix of a string that is also suffix of s (but not itself). For example, "abcdabc" has the longest prefix which is also a suffix: "abc" (not "abcdabc" since this is the entire string!!!). To make it fun, we call this prefix is "happy substring" of s. So the happy substring of "aaaaaaaaaa" (10 a's ) is "aaaaaaaaa" (9 a's).

Now we go back and see how finding happy sub string of s can help solve the shortest palindrome problem.

Suppose that q is the shortest string added to the front of s to make the string qs is a palindrome. We can see that obviously length(q) < length(s) since ss is also a palindrome. Since qs is a palindrome, qs must end with q, or s = p+q where p is a sub string of s. Easily we see that p is also a palindrome. Therefore, in order to have shortest qs, q needs to be shortest. In turn, p is the longest palindromic sub string of s.

We call s' and q' are the reversed strings of s and q respectively. We see that s = pq, s' = q'p since p is a palindrome. So ss' = pqq'p . Now we need to find the longest p. Eureka! This also means that p is a happy sub string of the string ss'. That's how the above algorithm works!!!

However, after some thought, the above algorithm has some loophole. p is not a happy sub string of ss'! In fact, p is the longest prefix that is also a suffix of ss', but the prefix and suffix must not overlap each other. So let's make it more fun, we call "extremely happy sub string" of a string s is the longest sub string of s that is a prefix and also a suffix and this prefix and suffix must not overlap. On the other word, the "extremely happy sub string" of s must have length less than or equal half length of s. 

So it turns out the "happy sub string" of ss' is not always "extremely happy sub string" of ss'. We can easily construct an example: s = "aabba". ss'="aabbaabbaa". The happy sub string of "aabbaabbaa" is "aabbaa", while the extremely happy sub string of "aabbaabbaa" is "aa". Bang!

Hence, the correct solution should be as following, based on the observation that length(p) <= length(ss')/2.

class Solution:
    # @param {string} s
    # @return {string}
    def shortestPalindrome(self, s):
        A=s+s[::-1]
        cont=[0]
        for i in range(1,len(A)):
            index=cont[i-1]
            while(index>0):
                if(A[index]==A[i]):
                    if index < len(s):
                        break
                index=cont[index-1]
            cont.append(index+(1 if A[index]==A[i] else 0))
        print cont[-1]
        return s[cont[-1]:][::-1]+s


Hooray!
As you can see, algorithms are interesting! And we programmers are responsible for making correct solutions carefully studying the problems. The only way for us is to keep learning, "Train dragons the hard way"!

1 comment:

  1. I think here "since ss is also a palindrome" u mean ss' right?
    s = "ab" ss = "abab" --> not palindrome.
    and here " qs must end with q, or s = p+q where p is a sub string of s" u mean s = p+q'?

    ReplyDelete