Books on Amazon
  • Customer Data Integration: Reaching a Single Version of the Truth (Wiley and SAS Business Series)
    Customer Data Integration: Reaching a Single Version of the Truth (Wiley and SAS Business Series)
    by Jill Dyché, Evan Levy
Wednesday
Jan052011

Expectations and predictions for MMXI

It has been a long while since I posted something on my blog, but with the New Year I committed myself to post more often. Consider it as one of my new year's resolutions. 

Much respected experts already have posted their predictions. Amongst others Jill Dyche and Jim Harris. Below I have given my expectations and predictions for the upcoming year regarding MDM for customer data. I grouped them to six items. Next year I will see if I have predicted the future correctly. In the meantime feel free to comment.

 

1. Entry of one or two new players in Gartner MQ and/or Forrester Wave.

After years of consolidation in the market of DQ and MDM vendors opportunities arise for new players. I expect to see Talend make serious steps forward. Just at the end of 2010 they acquired Sopera with which they will be able to extend their capabilities in real-time data integration. I expect them to enter the Gartner MQ for MDM of Customer Data in the niche player quadrant. We will not see any entries in the visionary quadrant. And a further consolidation will take place also. I.e. I really like Siebel/Oracle UCM, but do we need four or five different offerings of Oracle in the MDM arena?

 

2. More emphasis on MDM in the Cloud

Personally I strongly believe this is the only way to go in the end. I still don't grasp the idea that a company should implement a MDM product themselves. In most cases those companies are much better equipped to provide banking services, selling insurance policies or produce medicines. Why should we want to bother them with managing their master data? You should provide it as a service. More and more vendors are moving towards SaaS offerings, but it will take another two or three years before we are at the level as for instance Salesforce.com. 

 

3. Eye-pleasing user interfaces

I have seen many BI, CRM, MDM and DQ tools by now. For almost all of them you need special training to understand how to configure the tool. But even the applications for the Data Stewards require the end-user to have at least a degree in rocket science. I predict we will see a convergence in user interfaces for handling rejections and dubious duplicates. I won’t rest until I am able to do it on a iPad, without any instruction or manual. By the way I am able to do pretty cool things on a iPad, without ever using the F1 button.

 

4. Matching of customers using more and more attributes

Identity resolution or matching remains an important functionality within a MDM suite. But the focus will shift from the well known (fuzzy) matching on name, address, phone number, SSN, bank account, email etc. towards new attributes. Why don’t we use Skype, LinkedIn, Facebook, Twitter and Foursquare handles to identify persons. The main reason is that our current CRM applications are struggling to cope with these “social media” instruments. In other words we are not yet able to grasp this information and use it for matching purposes. In 2011 we will see a maximum of three vendors which we be able to incorporate these social media attributes in their data models.

 

5. Privacy and opt-in registration will take off

More and more we will see that the end-consumer will manage his or her own data profile. In 2010 we have seen that Facebook had some issues with privacy. The same issues we will see in MDM environments. The only way to overcome this is to make sure that the end-consumer can enter his own details. And in the end, he or she is the only one who knows the real truth.

 

6. Reference data becomes a commodity or even obsolete

Can someone explain me why we still use (expensive) address/postal data if I can use a Google Map API for free with the same results. And why would you pay for algorithms which can check if a Social Security Number or Bank Account Number is valid, if those algorithms are available in the Open Source community. Google, Bing and WolframAlpha are making huge steps in indexing reference data. Crowd sourcing will also become an viable alternative. In the Netherlands we already have a crowd-sourced alternative for address/postal data. It is even more up-to-date than the official sources. Another example is Jigsaw.com, such initiatives will receive more VC funding in 2011.

Wednesday
Apr142010

How to pronounce those difficult names

Nowadays more and more people have uncommon names, which are difficult to pronounce. Or vice versa, if you hear a unfamiliar name, it is probably hard to spell it correctly.

I came across two interesting sites, which will help you with the pronunciation of names. Hearnames has a large database where you can have a sound clip of a certain name in just a few clicks. They have ordered the names by region or nationality. Number of countries is still growing. This site has also links to other interesting sites if you are into given names, surnames etc. 

A similar site is the Pronounce Names. It is not so user-friendly and it looks like that the database is not so extensive. What I do like is that it set-up with different contexts in mind. Even if you spell names exactly the same, they are pronounced differently depending the background of the "sender". This site stores all the different pronunciations. 

Two different sites, two different approaches, similar content. Combine it with sites like Behind Names , The World Names Profiler, Family Search (find your ancestors) and their new prototype search engine put Wolfram Alpha on top of it as for the visualization and spectacular infographics and you have a hit. 

Tuesday
Feb162010

Identifying by association

A few weeks ago we were playing Trivial Pursuit and came up with a slightly different game. 

Can you name the right person using associations? The game only works with famous or well-known people. Important rule of the game: The association should result in just one undisputed name of a person. If different people can be linked to the association than it is not regarded as a correct association.

Less needed characters are better. Ultimate goal is to identify a person by an association with the least characters possible. We came up with 4 characters. If you have other associations or can go under the four characters, please let it know via the comments.

 

Famous quotes

1. Ich bin ein Berliner (20 characters)

2. I have a dream (14 characters)

3. Veni Vidi Vici (14 characters)

 

By company

4. Microsoft (9 characters)

5. Apple (5 characters)

 

Well-known dates

6. 25/12/00 or 12/25/00 (8 characters) 

7. 40-45 (5 characters)

8. 1492 (4 characters)

 

I am curious to hear other examples, so please fill the comments with interesting examples. If the examples were to difficult, please click on the image for the answers.

Saturday
Feb132010

Town logo looks like a vagina

In the North of Holland a new municipality has been established consisting of the towns Winschoten, Scheemda en Reiderland. Many inhabitants don't like the new logo. It looks too much like a vagina, in their opinion. Or, as they call it 'een kut logo'. Please feel to use Google Translate or Babel Fish to get a translation.

Thursday
Feb112010

Categorizing Identifying Attributes of People

Creating or consolidating the various source customer records into the so-called Golden Records requires that decisions are made about which attributes will be included in the Golden Record. 

I always start with the identifying elements of the customer. But there are many, sometimes numerous attributes which identifies the customer. To make it easier to decide which will be included in the Golden Record it is advisable to categorize the elements (fields/attributes). After some thoughts I came to the following categorization:

 

1. Physical

In this category I put elements like finger-prints, iris-scan, gender, blood-type, DNA-profile. In fact all biometric elements which don't change often over time. Bodyscanners are supposed to be the next biometrical identification system. The USA already has budgeted to buy 1,000 scanners this year. Maybe we are than able to identify the crotch Christmas Bomber. Btw, trans-gender surgery is still not very common, so Gender still fits in this category.

Mainly governments use identifying elements as such. You see them in your passport, driving license or are confronted with them at airports.

2. Given by birth

These are elements like birth-date, place of birth, surname, given name etc. In most cultures and countries attributes like these don't change during a lifetime. At least most of them don't change. Of course marriage and divorce are spoilers in this matter. And in some countries your surname changes when you become a parent. Your nation and place doesn't change either, at least when you were not born in i.e. Yugoslavia. 

3. Assigned by others

In most countries a system is used for Social Security Numbers. These elements can be characterized by the fact that you as person are not able to influence them. You are not able to choose your own SSN. In the Netherlands I have been told that only two people have the algoritm for creating the so-called BSN numbers. Yes, we are able to validate them, but generating them is another story. Btw, I don't consider putting a Cray computer to work as a viable option for generating those numbers.

Other examples are your army-number. Older members of my family still know their number by heart. But also bank account, telephone and credit card numbers can be classified by "Assigned by others". Although you certainly have freedom in choosing your bank ad telco provider. In that manner you are able to influence it a little bit. 

4. Chosen (more or less) by yourself

Example of this category is an email address. Long gone are the days that your ISP assigned an address for you. Nowadays you may choose your own email handle. But you might also say that you yourself choose the home where you live, and thereby you choose your own home address. 

5. Pseudonyms

Although the trend is that we more and more use our own names, it is still very common to use pseudonyms. On Hyves, Facebook, Twitter and blogs you see a lot fake names (i.e. Fake Steve Jobs). My famliy has an Indonesian background, I always found it peculiar to hear other names for family members than their official name. It weren't even abbreviations, but totally fantasy names. 

6. Not aware of or hidden for yourself

And last but not least we can identify people by means they are not aware of. It is obvious that the Google bots leave more cookies behind and scan more mac and IP-addresses than we can imagine. But your mobile phone hides also all kinds of codes, with which they are able to track you.

 

OK, a categorization is nice, and these were just a few examples. But why bother, you might ask. I found it helpful to have discussions with the IT and Business people. Each category has his own characteristics and for instance it's own decay-rate. I.e. your birth-date won't change over time, but it is very likely that your home addresses is outdated after 7 years. In Holland 15% of the people tend to move on average each year. In other words the categories differ in terms of difficulty to obtain, difficulty to maintain, usability, decay rate. It also helps to start discussions about data ownership and define data governance rules for specific sets of attributes and data.

Still looking how I could visualize this in a simple manner.