Special Report: Match point

Good address tools can match incomplete or 'fuzzy' data and locate duplicates from the outset. Rob McLuhan reports.

The growing public disenchantment with direct mail means that data quality has never been more important. But with so many marketing channels available, it's easy for inaccuracies to creep in.

Names and addresses can be misheard by agents in call centres, or wrongly transcribed from illegible writing on forms and coupons. Sales personnel under pressure can easily make mistakes, as can consumers when filling out forms online.

Another headache arises from data being collected in non-standard ways, which makes it hard to process later. When QAS recently surveyed 800 companies, two-thirds said they capture customer data from three or more channels, but only 37 per cent do so in a standard format.

A good address tool, based on Royal Mail's Postcode Address File (PAF), can minimise these problems at the outset, and are becoming increasingly sophisticated. Products from suppliers, such as QAS, Capscan and GB Group, can be made available to call-centre operators and web users.

ChoicesUK, a home entertainment retailer, uses QAS's QuickAddress Pro to ensure that data captured at source is accurate. This is vital, as it rents many of its DVDs and videogames and is dependent on obtaining accurate customer addresses.

Graham Lyden, director of information services at ChoicesUK, says: "Before we introduced QuickAddress, we had no address-management solution in-store, so we had lots of badly-formatted, poor-quality addresses. Now we capture much better quality data first time." But many customers purchase items as gifts, and may not know the recipient's correct postcode. To overcome this, suppliers now offer a second option that enables the user to enter incomplete information and select from a graded set of possible matches.

A recent development is the merging of names with addressing tools, which is provided by a version of QuickAddress, using data from Experian. This can be used by agents in call centres, who can pick from a list to confirm a caller's name to ensure they get the spelling right.

With the rise of e-commerce and online banking, a huge area of growth in the last couple of years has been data capture from websites. The latest online software is designed to provide greater accuracy and aid users, but without making too many demands of them. It includes easy-completion drop-down boxes, forced validation and mandatory fields, as well as predictive input data.

But technology is only part of the solution. "Careful thought also needs to be given to the questions that the web form is asking, and call centres should be given an opportunity to ask questions that help validate information," says Terry Hiles, managing director of Capscan.

Getting the name and address right is only part of the story. The record may exist somewhere in the database, having been collected from a previous interaction. In that case, a duplicate will be created, which will have to be dealt with later. Ideally, address-management software could be linked to databases to search for potential duplicates at the point of capture, and this is on the horizon (see box above).

Companies are well advised to collect clean data at the point of capture. But that does not mean they can forget about batch cleansing and deduping before carrying out campaigns.

Even data gathered by postcoding systems will not necessarily be accurate. Royal Mail changes about 11 per cent of postcodes each year, and data deteriorates quickly as people move house and businesses change premises or ownership.

In practice, companies tend to be relatively lax about making data accurate at the collection stage, and expect the real work to take place later. But this can be a complex task. Some records will be so incomplete that it is not possible to fill in the missing fields. When this happens, an addressing tracking service, such as EuroDirect's CallTrace, can be used to locate the missing customers and prospects. This product is available to financial services companies that contribute to data pools.

Deduping software has to deal with many different formats and sources, as well as reconfiguring the data to make it mailable. This includes sorting contact names and splitting addresses into the correct fields and providing the right salutations.

Recent technology innovations have improved data accuracy. Software that incorporates 'fuzzy' matching can identify not only exact duplicates, but close matches, such as 'East Street Newsagent' and 'East Road Newsagent'.

Similarly, phonetic matching corrects names that have been recorded incorrectly in call centres. This will clear up cases where an 's' has been heard as an 'f', producing Falwell instead of Salwell, or where 'Shaw' has been recorded as 'Shore'.

These components are now incorporated into addressing tools, to help fill in the missing gaps, but they really come into their own in batch cleansing.

HelpIT uses fuzzy and phone-tic algorithms to match records by the way words sound and common variations, rather than precise spellings of fields. When Vitalise, a national charity for the disabled, switched to this software it found about 50 per cent more potential duplications between its two databases than in previous cleansing exercises.

"Most systems offer a direct match on postcode and surname, but that is not enough," says Chris Cuffe, managing director of helpIT. "The data can be entered many different ways, with or without gaps in the postcode, for instance, or the surname misspelled. The chances of finding a precise match with such a small amount of data is very limited. So you need this clever technology to find records."

Fuzzy and phonetic matching are key components in data integration company DataFlux's product dfPower Studio, which has been used by The Number to cleanse 210 million listings in its 118-118 directory service. First, the tool identifies close matches that may indicate a duplicate, then it examines other fields within the data record, such as the postcode and telephone number, to be sure of a match.

The software is also active at the point of capture, to ensure that all details, such as company initials, business names, addresses and numbers, are entered in a standardised format. AndrewLarter, 118-118 data and development director, says: "Because we take hundreds of millions of calls, even the smallest errors are unacceptable. So our data-processing volume and error correction operates on a very large scale. After we implemented the dfPower Studio solution, we were able to totally eliminate the risk of bad data."

A recent development is the use of additional files, such as historical data, being used to identify further information about individuals. The aim is to become more accurate in deduplication by using past behaviour as an indicator for future changes.

However, it's unlikely that the business of ensuring data accuracy will ever be an exact science. The increasing number of data sources, communication channels and cleansing tools means it is constantly evolving. Nor does it depend entirely on technology. Specialists can also help clients understand the implications of subtle changes to matching criteria and the relevance for their particular dataset. That's askill that can never be automated, and which, ultimately, raises the effectiveness of deduping routines.


Arjan Dijk, director of marketing, Capital One

"We regularly apply de-duplication rules within and across our data sources, and re-link our data to ensure we have the most up-to-date details for potential and existing customers."

Tim Pottinger, database solutions director, EuroDirect

"Rectifying incorrect data once it has made its way to your database is not an impossible undertaking, but it is a substantial and complex one to achieve."

David Laybourne, technical director, DPS Direct Mail

"There needs to be a balance between generating accurate data and making the online process too onerous for customers to follow. Get it wrong either way and the price can be high."

David Green, business development director, GB Group

"Keeping customer data up to date is now far more challenging, due to media fragmentation and the fact that many businesses now have multi-channel outlets."

NEED TO KNOW: New deduping and cleansing tools

Postcoding solutions are well established to ensure accuracyin data capture. However, matching to the Postcode Address File (PAF) does not prevent you from creating a new record, if one already exists somewhere on the database.

The customer may have given their details before, or may be unaware that they are dealing with the same company under a different brand name.

Companies should be looking to prevent duplication at the point of capture. This can be done by linking postcoding software to customer databases and CRM systems, which allows companies to check for duplicate records in real time.

Capscan's latest products support integration with client databases, although this is something companies need to do in-house. QAS says it too is looking at the possibility of linking its products with client databases.

The new record could be matched by address only, or by other information such as account name or date of birth. However, recognising a duplicate is not always that easy. "The technology would have to be fairly sophisticated in the way it finds matches to data with different formats and spellings," says Helen Roy, product marketing manager for QAS.

More businesses are looking for data services to be delivered over the internet. GB Group expects most address cleansing and capture installations to move online over the next five years.

It recently launched an Accelerator Online service, powered by a free-format, address-matching engine. This ensures that contact details supplied by customers online are fed back to companies in standardised UK and international address formats.

GB Group claims that the operational benefits are too great for businesses to ignore, and that this represents a new departure in industry thinking.

CASE STUDY: Wolters Kluwer

Brief To dedupe customer data from disparate files

Supplier SAS

Cleansing and deduping is a priority for Wolters Kluwer, a global publisher specialising in HR, legal and other business information. It has grown by acquisition, leading to a proliferation of databases, while customer data continues to be recorded in a variety of ways, at numerous touch points.

However, efficiencies were being compromised because of the different ways the data was being gathered.

Business systems manager Mike Turner says: "Straightforward matching is not a problem, but we found it difficult to explain the intricacies of how we collect data to our outsourced supplier, which meant possible matches were being missed. We decided we would get a better result by acquiring an appropriate tool and using it internally."

Using an integrated SAS solution, Wolters Kluwer has built eight match keys that score potential duplicates according to the quality of the match. Thresholds have been set to establish a definite 'no' or 'yes', while the 'not sures' are placed in a separate quarantine file for manual examination.

Lessons can be learned from analysing quarantined records and can take the deduping process one step further. Knowing the kind of errors that sales personnel are likely to make when capturing records means that rules can be created specifically for data from that source. This allows the company to eliminate a large amount of duplicate data.

The file of 205,000 businesses that Wolters Kluwer believed it was trading with, has been cut to 168,000, a reduction of 18 per cent. This, and enhanced targeting, has led to a three-fold cut in its campaign costs.

"That's a substantial result," says Turner. "We knew two years ago that there was a problem, but we just couldn't locate it. Now we are starting to cut the excess costs out of the business."