Is this the Right Match?
We
are mere human beings and when we want to find the right match we have so many
tools at our disposal and that too without even our knowledge. That’s God's
power of matching algorithm, you know you are using it but you don't realize
its complexity because you are so much overwhelmed by its simplicity.
Matching
engines are not unknown any more. In computing world when we talk about match
and merge we try to quantify and qualify information under groups.
Information’s no longer remains information they are data and there are
attributes/nature of those data that we try manipulate, compute and calculate.
From
1918 till this date, ever since Soundex was patented there are series of
algorithms that mathematicians and computing engineers have tried creating but
the very base remained same. In order to compute the likeness of a word we need
to compute it mainly in two forms
1. Exact, which is very much like stripping
the word into ASCI or binary and comparing them.
2. Fuzzy/Phonetic/Similar,
which is to create codes out of its pronunciation to match, which is indeed
complex but very matured.
Now lets take it to a level above, what happens when
we match sentence to sentence? You cannot break every word in a sentence to
compare. Obviously for group of words the algorithm differs but so far we have
the knowledge of tools matching names, and matching addresses. What if we have
to compute a mechanism to match phrases? Well! The complexity is rising in many
fold and so is the awes of the fact that when we use God's match rules we can
apply that to any thing and any form and most of the time without even thinking
that we are doing it. We can apply it to names, to places, to people and their
types, to smells to colors even to relations.
Next time if you ever think,
"What's the Right Match?' try thinking how are you deriving that it is the
right match.. most of the time the answer will be it's that guy whom we all
know as the inner self.
All boils down to the simplest form of matching that we can programmatically obtain and is often known as String Matching.
String Matching involves two major type of algorithms:
1. String Matching Algorithms a.k.a String Searching Algorithms
In order to match a string you will have to identify the string. Pick it and try searching for that string in a given set of strings.
Kind of String Searching:
A common string searching method is to select the candidate to match or search across. This will narrow down the list and also the expense to perform the search.
Another common example involves "normalization". For many purposes, a search for a phrase such as "to be" should succeed even in places where there is something else intervening between the "to" and the "be":
- More than one space
- Other "whitespace" characters such as tabs, non-breaking spaces, line-breaks, etc.
- Less commonly, a hyphen or soft hyphen
- In structured texts, tags or even arbitrarily large but "parenthetical" things such as footnotes, list-numbers or other markers, embedded images, and so on.
Many symbol systems include characters that are synonymous (at least for some purposes):
- Latin-based alphabets distinguish lower-case from upper-case, but for many purposes string search is expected to ignore the distinction.
- Many languages include ligatures, where one composite character is equivalent to two or more other characters.
- Many writing systems involve diacritical marks such as accents or vowel points, which may vary in their usage, or be of varying importance in matching.
- DNA sequences can involve non-coding segments which may be ignored for some purposes, or polymorphisms that lead to no change in the encoded proteins, which may not count as a true difference for some other purposes.
- Some languages have rules where a different character or form of character must be used at the start, middle, or end of words.
Finally, for strings that represent natural language, aspects of the language itself become involved. For example, one might wish to find all occurrences of a "word" despite it having alternate spellings, prefixes or suffixes, etc.
Another more complex type of search is regular expression searching, where the user constructs a pattern of characters or other symbols, and any match to the pattern should fulfill the search.
and
2. Approximate String Matching
If you want to read more about it please visit:
String searching algorithm
or
Approximate string matching
Happy Wednesday!
-
Kinshuk Dutta
New
York
No comments:
Post a Comment