Taro Kuriyama

Palindromes II

Palindromes I visualized a naive algorithm that finds the longest palindromic substring with worst-case complexity of \(O(n^2)\). The below compares the same naive algorithm with a linear-time algorithm, tracing the algorithm path as a given string is scanned from left to right.

The two input string examples are relatively difficult ones, in the sense that repeated characters and overlapping palindromic substrings cause the naive algorithm to approach its worst-case complexity of \(O(n^2)\).

Each circle, or node, represents a character visited by the algorithm. The visualization shows that the naive algorithm on the left performs significant duplicate work by visiting the same nodes repeatedly, producing a relatively dense graph. In contrast, the linear-time algorithm on the right visits fewer nodes and terminates in fewer iterations, producing a sparse graph. (For illustrative purposes, the naive algorithm is not optimized to terminate as early as possible.) The longest palindromic substring is identified by two green points on the x axis with the greatest distance between them. In the first case of input string "xxxxxxxaxxx", there are two solutions of equal length: "xxxxxxx" and "xxxaxxx". In the second case of input string "babcbabcbac", the solution is "abcbabcba".

The specifics of the linear-time algorithm are discussed further below.

Left: quadratic algorithm. Right: linear algorithm.
>> Restart animation.
Left: quadratic algorithm. Right: linear algorithm.
>> Restart animation.

Built with D3.

Linear-Time Algorithm

Wikipedia documents Glenn Manacher's 1975 paper as the origins of a linear time algorithm, followed by developments in the 1990s to extend Manacher's idea to find the longest palindromic substring.

I found the formal papers difficult to parse, but a Google search returns some more accessible results: The above visualization shows that in comparison to the given naive algorithm, the linear-time algorithm is clearly superior. Moreover, it's not hard to believe that the running time of the latter is indeed linear (perhaps if one were feeling generous). But the graphs don't really convey the logic behind the linear-time algorithm -- in fact, the output can seem rather arbitrary. Rather than rehash the full details and proof of the algorithm, the following section attempts to explain just its crux, which eluded me for some time.

> Link to Github code.

If there are no palindromic substrings, or if there is only one palindromic substring, the naive algorithm described in Palindromes I terminates in \(O(n)\) time. This follows directly from the fact that constant time is required for each step (and if a palindromic substring is present, an additional maximum of \(n\)). The challenge arises when there are multiple, overlapping palindromic substrings, as in both examples above. The problem is especially clear in the case of input string "xxxxxxxaxxx". How can duplicate work be avoided?

The key to saving effort lies in the symmetrical nature of palindromes: by definition, all palindromic strings are reflections of themselves around the center. We proceed by taking advantage of this fact.

Let's say we have a situation like the graph below, in which:

Mouseover \(palLen\) from center to left to follow the algorithm's progress (as explained below). The pink line represents the palindrome centered at \(pLeft\), were it to be fully reflected at \(pRight\).

>>Reset graph.

Rather than naively moving the center to place 12, we attempt to fill in \(palLen\) to the right of the current center (the "?" characters). For each position \(pRight\) and its \(distToEdge\), we proceed by examining the value of its pair \(palLen[pLeft]\). Three cases are possible. Recall that, to a large extent, palindromes centered at \(pLeft\) and \(pRight\) should mirror each other.

Given the above, the rest of the algorithm is mostly a matter of correct bookkeeping as the string is processed until \(R\) reaches the end.