Institut für Informatik

Technical Report No. 29, May 1991 - Abstract

Esko Ukkonen:
Approximate String Matching with q-grams and Maximal Matches

This report is not available in electronic form!

We study approximate string matching in connection with two string distance functions that are computable in linear time. The first function is based on the so-called q-grams. An algorithm is given for the associated string matching problem that finds the locally best approximate occurences of pattern P, |P|=m, in text T, |T|=n, in time O(n*log (m-q)). The occurences with distance <= k can be found in time O(n*log k). The other distance function is based on finding maximal common substrings and allows a form of approximate string matching in time O(n). Both distances give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edit distance based string matching.