While networking recently, I was having a drink with a fellow data architect and dare I say friend, Bob Conway.  During the conversation while discussing some of the impractical things clients sometimes think they want us to do, Bob reminded me of Will Smith’s means and will test.  This triggered a flood of memories, nostalgic feelings and gratefulness that  in 1992 or maybe 1993 I had the good fortune to attend “Applied Data Modeling”, a class taught by William G Smith.  The distinctions I learned in that one class fundamentally changed the course of my career and have served me faithfully for the last 24 or so years.

Prior to that class I had been working with databases for several years. Mostly PC databases, first dBase and then Paradox.  While the communications officer attached to the USS Comstock one of my collateral duties was “Data Processing Officer”.  I was responsible for all the computer assets and the security of those assets.  I had created several database applications for the ship.  One to track and manage guests coming to the commissioning of the ship.  Another to track and manage the security clearances of the personnel onboard.  On occasion I was able to see what a friend was doing to evaluate and track the status of commercial mortgages.

Prior to the class, it was mostly intuition, trial and error.  I didn’t know what I didn’t know.  I had rarely, if ever seen a data model. After the class it was like turning on a light in a dark room.  Better to light a single candle than curse the darkness and all that.

The data modeling diagrams and tools took me back to my days as an electrical engineering student.  We would design, document, and simulate circuits with Computer Aided Design (CAD) software.  I loved doing that and I had missed it while doing four years in the Navy after graduation to pay for school. After 4 years in the Navy my Electrical Engineering skills were decidedly stale, so I had gone into IT.  Thank God for Data Modeling and Data Modeling tools.  It gave me my engineering design fix in domain of data.

Anyway back to “Applied Data Modeling”.

I’m sure I’ve forgotten many if not most of details of the class but what has stuck with me is that

  1. The basics: Entities, attributes, keys, roles and relationships all represented on a diagram/picture.  Logical to physical to DDL.  The tool generates the DDL!
  2. A key needs to be stable, unique, not null and minimal.
  3. A company can only track in a database system the pieces of information it has the means and the will to collect.
  4. It can cost millions of dollars to fix a bad data model/database design after the fact.
    Two examples I still remember over 20 years later: 
    1.  Hallmark Cards.  When I was a kid and probably into my adult years you could turn over a Hallmark greeting card and on the back, embedded in a string letters was two digits a space and two more digits.  Even as children we learned that these four digits were the price of the card. Two digits for dollars and two digits for cents.  AMXTTYP 02 75 TTXXMIR = $2.75. What I learned in class that day was that the entire string was actually the primary key of the product.  AKA the “Product ID” or “Product Key”.  So what is the big deal you might ask, and we did in class, and then we realized that each price change for a card required it to have a new product ID. And if you have a new Product ID you have to have all new information related to that product even though the only thing that had changed was the price.  I don’t remember the exact ratio but it seems like more than 50% of the data in the system was due to price changes and the ripple effect of copying all the data associated with each card just for a price change. All because the primary key for the product was not stable or minimal.
      Not only that the price was limited to a maximum value of $99.99. A reasonable assumption for greeting cards but when Hallmark later wanted to get into the business of selling gifts there was a time when no item in the store could be sold for more than… you guessed it $99.99 because this limitation was built into the sales and inventory system and it would have been extremely expensive and risky to change.  It was cheaper and less risky to start a whole new company, complete with new computer systems and programs than it was to alter the legacy computer programs with a four digit price built into the product ID.
    2. A Scandinavian country choose to embed the gender of a person into their equivalent of our social security number.  These data designers did not anticipate the  transgender movement.  If a person was willing to go through the process of altering their gender they did not want to have an ID number still saying they were the former gender.
      But by now their ID number was spread all over the country. Think about all the places an American must use their SSN and the impact of needing to allow someone to change their SSN.  Do we try to correct all the old records?  Create a map from old to new?  How would you do either and what would it cost?

The other thing I am eternally grateful to William Smith for is his example of the value of a good data consultant and what they can get paid.  The class I took in 1992/1993 “Applied Data Modeling” cost the company I worked for $1,000 for 5 days of training, a very modest $200/day.  If memory serves we had approximately 22-25 people in the class, so lets say a gross of $22,000 for 5 days.  Let’s say he was able to cap his expenses at $2000 for the week, then William G Smith and Associates brought home $20,000 for 5 days of training or $4,000/day.

If you check his website (http://www.williamgsmith.com/) you will see that he offers his training at a fixed $4000/day plus expenses. “We price our in-house (at client site) training on a per day basis, not a per student basis, to encourage our clients to train more people for less cost than most training providers”

To paraphrase Tony Robbins… $4000/day, you can’t live on that but it’s a start.