Thursday, March 29, 2012

Getting a final version of a person into a DW

I have about 8 databases to integrate. All of the databases have ssno, address city...ect. I need to create a DW table with one unique record for each actual person. In other words,

Joe Smith,123 Main St, Anytown, State,....+ssno

goes into the DW table and is the same person as Joseph S. Smith,123 Main Street... and any other versions.

Could someone point me to a reference or give me an outline of how to do this in and SSIS package?

Is fuzzy logic used here?

Do I need to deduplicate the feeder systems first?

It needs to handle a situation in, for example, the Bronx New York where there could be an apartment buiding with 7 people named Jose Sanchez .

I hope I've been clear, I'm a newbie at this DW stuff, but it's fascinating. Any help would be appreciated. Thanks

Fuzzy could be of help here. The fuzzy grouping can be used for deduplicating, the fuzzy lookup to check whether you already have a record that resembles your new one.

You will need to spend some time (by testing) to figure out a similarity value that suits your situation. In a DW environment, I think this should be a business decision.

From what I understand, ssno probably needs to be involved. In the lookup (haven't used the grouping yet) you can set similarity values for ssno addressno and name. For example, ssno (or birthdate?) needs a similarity of 1, and the name needs to have a similarity > 0.6

Regards,

Pipo

No comments:

Post a Comment