Palindromes II
Palindromes I visualized a naive algorithm that finds the longest palindromic substring with worst-case complexity of \(O(n^2)\). The below compares the same naive algorithm with a linear-time algorithm, tracing the algorithm path as a given string is scanned from left to right.
The two input string examples are relatively difficult ones, in the sense that repeated characters and overlapping palindromic substrings cause the naive algorithm to approach its worst-case complexity of \(O(n^2)\).
Each circle, or node, represents a character visited by the algorithm. The visualization shows that the naive algorithm on the left performs significant duplicate work by visiting the same nodes repeatedly, producing a relatively dense graph. In contrast, the linear-time algorithm on the right visits fewer nodes and terminates in fewer iterations, producing a sparse graph. (For illustrative purposes, the naive algorithm is not optimized to terminate as early as possible.)
The specifics of the linear-time algorithm are discussed further below.
Left: quadratic algorithm. Right: linear algorithm.
>> Restart animation.
Left: quadratic algorithm. Right: linear algorithm.
>> Restart animation.
Built with D3.
Linear-Time Algorithm
Wikipedia documents Glenn Manacher's 1975 paper as the origins of a linear time algorithm, followed by developments in the 1990s to extend Manacher's idea to find the longest palindromic substring.
I found the formal papers difficult to parse, but a Google search returns some more accessible results:
> Link to Github code.
If there are no palindromic substrings, or if there is only one palindromic substring, the naive algorithm described in Palindromes I terminates in \(O(n)\) time. This follows directly from the fact that constant time is required for each step (and if a palindromic substring is present, an additional maximum of \(n\)). The challenge arises when there are multiple, overlapping palindromic substrings, as in both examples above. The problem is especially clear in the case of input string "xxxxxxxaxxx". How can duplicate work be avoided?
The key to saving effort lies in the symmetrical nature of palindromes: by definition, all palindromic strings are reflections of themselves around the center. We proceed by taking advantage of this fact.
Let's say we have a situation like the graph below, in which:
Mouseover \(palLen\) from center to left to follow the algorithm's progress (as explained below). The pink line represents the palindrome centered at \(pLeft\), were it to be fully reflected at \(pRight\).
>>Reset graph.
Rather than naively moving the center to place 12, we attempt to fill in \(palLen\) to the right of the current center (the "?" characters). For each position \(pRight\) and its \(distToEdge\), we proceed by examining the value of its pair \(palLen[pLeft]\). Three cases are possible. Recall that, to a large extent, palindromes centered at \(pLeft\) and \(pRight\) should mirror each other.
Given the above, the rest of the algorithm is mostly a matter of correct bookkeeping as the string is processed until \(R\) reaches the end.
Palindromes I visualized a naive algorithm that finds the longest palindromic substring with worst-case complexity of \(O(n^2)\). The below compares the same naive algorithm with a linear-time algorithm, tracing the algorithm path as a given string is scanned from left to right.
The two input string examples are relatively difficult ones, in the sense that repeated characters and overlapping palindromic substrings cause the naive algorithm to approach its worst-case complexity of \(O(n^2)\).
Each circle, or node, represents a character visited by the algorithm. The visualization shows that the naive algorithm on the left performs significant duplicate work by visiting the same nodes repeatedly, producing a relatively dense graph. In contrast, the linear-time algorithm on the right visits fewer nodes and terminates in fewer iterations, producing a sparse graph. (For illustrative purposes, the naive algorithm is not optimized to terminate as early as possible.)
- Green point = presence of palindrome (single characters included).
- Red point = no palindrome.
- Gray points = space between characters.
The specifics of the linear-time algorithm are discussed further below.
Left: quadratic algorithm. Right: linear algorithm.
>> Restart animation.
>> Restart animation.
Built with D3.
Linear-Time Algorithm
Wikipedia documents Glenn Manacher's 1975 paper as the origins of a linear time algorithm, followed by developments in the 1990s to extend Manacher's idea to find the longest palindromic substring.
I found the formal papers difficult to parse, but a Google search returns some more accessible results:
- Johan Jeurig, "Finding palindromes efficiently"
- Fred Akalin, "Find the Longest Palindromic Substring in Linear Time"
- LeetCode, "Longest Palindromic Substring Part II"
- Stack Exchange, clarification of LeetCode example
> Link to Github code.
If there are no palindromic substrings, or if there is only one palindromic substring, the naive algorithm described in Palindromes I terminates in \(O(n)\) time. This follows directly from the fact that constant time is required for each step (and if a palindromic substring is present, an additional maximum of \(n\)). The challenge arises when there are multiple, overlapping palindromic substrings, as in both examples above. The problem is especially clear in the case of input string "xxxxxxxaxxx". How can duplicate work be avoided?
The key to saving effort lies in the symmetrical nature of palindromes: by definition, all palindromic strings are reflections of themselves around the center. We proceed by taking advantage of this fact.
Let's say we have a situation like the graph below, in which:
- The input string "babcbabcbaccba" is being scanned from left to right.
- At the current center of place 11, the longest palindrome is 9 characters long. The three lines labeled \(L\), \(C\), and \(R\) denote the left, center, and right of the palindrome.
- We have maintained an array called \(palLen\), which tracks the length of palindromes centered at each character seen so far. For example, \(palLen[11] == 9\) and \(palLen[7] == 7\).
- We define \(pRight\) and \(pLeft\) as the pair of "place" indices that are mirrored around the current center. For example, if \(pRight\) is 12, its pair \(pLeft\) is 10.
- We define \(distToEdge\) as \(R\) - \(pRight\). For example, at place 13, \(distToEdge\) = 20 - 13 = 7.
Mouseover \(palLen\) from center to left to follow the algorithm's progress (as explained below). The pink line represents the palindrome centered at \(pLeft\), were it to be fully reflected at \(pRight\).
>>Reset graph.
Rather than naively moving the center to place 12, we attempt to fill in \(palLen\) to the right of the current center (the "?" characters). For each position \(pRight\) and its \(distToEdge\), we proceed by examining the value of its pair \(palLen[pLeft]\). Three cases are possible. Recall that, to a large extent, palindromes centered at \(pLeft\) and \(pRight\) should mirror each other.
- Case 1: \(palLen[pLeft]\) < \(distToEdge\)
The palidrome centered at \(pLeft\) is reflected at \(pRight\), since the length of the palindrome does not exceed the right boundary \(R\).
Therefore: \(palLen[pRight]\) = \(palLen[pLeft]\). Do not move center.
- Case 2: \(palLen[pLeft]\) > \(distToEdge\)
The palidrome centered at \(pLeft\) cannot be entirely reflected at \(pRight\), since the length of the palindrome would exceed \(R\). However, we know by definition that the palindrome at \(pLeft\) is reflected at \(pRight\) up to a certain point -- precisely, \(R\). In addition, we know that the palindrome at \(pRight\) does not exceed \(R\). If it did, the palindrome centered at \(C\) would match beyond \(L\) and \(R\), and would thus be longer.
Therefore: \(palLen[pRight]\) = \(distToEdge\). Do not move center.
- Case 3: \(palLen[pLeft]\) == \(distToEdge\)
The palidrome centered at \(pLeft\) is reflected at \(pRight\), since the length of the palindrome does not exceed the right boundary \(R\). Unlike case 2, however, it is possible that the palindrome centered at \(pRight\) could extend further -- we don't know, so we need to expand beyond \(R\) using \(pRight\) as the new center.
Therefore: \(palLen[pRight]\) = \(distToEdge\). Move center to \(pRight\) and attempt to expand the palindrome beyond \(R\).
Given the above, the rest of the algorithm is mostly a matter of correct bookkeeping as the string is processed until \(R\) reaches the end.