Its been frustrating me for a while now that the matching debate has’nt moved on from Fuzzy matching (guesswork).
The problem as I see it is that Fuzzy matching by its very nature is an imperfect science, and the more sophisticated (complex) it gets the more it increases the requirements for manually eyeballing the search results. This is very time consuming and adds a lot of unseen expense to the cost of Matching and Cleaning data.
Academics around the world are looking for a technical solution for a problem that the human brain can manage easily, the answer for me is is simple – human logic. Human logic is not programmed, its not invented, its not techicial – its a simple knowledge transfer.
I would rather see the industry as a whole move towards an intelligent approach, and focus on intelligence led matching.
What do I mean?
Well lets take company names as an example, for my money Fuzzy matching adds little value.
Whats more important is being able to determine that GSK is an acronym for GlaxoSmithKline, that BBC stands for British Broadcasting Corporation and that UPS is United Parcel Service. These are just a few example but they are prevalent throughout the business world.
Its not just company name, what about peoples names, the problem still exists, Dick is short for Richard, Larry is a nickname for Lawrence, Bob is short for Robert. Bill often refers to William, Liz for Elisabeth, Millie for Amelia – I could go on…
It’s amazing that one of the cornerstones of Fuzzy logic is phoentics, most commonly Soundex and various iterations on the theme. Believe it or not Soundex was actually invented back in the early 1900’s (http://en.wikipedia.org/wiki/Soundex) . We are now well into the 21st Century, lets start leveraging the technology of today, not re-inventing the past.
I will post more on this in the future, but feel free to ping me with your thoughts.
