Michael J. Swart

January 19, 2011

Ridiculously Unnormalized Database Schemas – Part One

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , — Michael J. Swart @ 12:00 pm
Ridiculously Unnormalized Database Schemas

So last week I introduced this series where I try to explore the different normal forms by breaking them in absurd ways. This time I’m talking about first normal form.

First Normal Form (1NF)

First normal form has a kind of fuzzy definition. A normalization process was mentioned by E. F. Codd in 1970 which came to be known as first normal form (Even though 1NF has been redefined and clarified a few times since) The paper is called A Relational Model of Data For Large Shared Data Banks. So in this article I’m going to deal with Codd’s description of 1NF in that paper. A database model is in 1NF if it is relational and has

“all relations are defined on simple domains”

Think of relations as tables and simple domains as column spaces that are atomic (i.e. can’t be decomposed). And so today’s ridiculously un-normalized database example is from that paper.
Good and bad relational schemas
So having tables as values inside tables is not something that SQL Server (or any RDBMS supports) but actually the example was given to explain how to normalize (to 1NF) and to avoid hierarchical data models that were common at the time. These hierarchical data models were and are implemented by systems like IMS and other systems that look like file systems. So in 1970, something like xml might have been easier to understand than relational data models in 1NF. The example above would actually be visualized like this in xml:

<Employee Man#="123" Name="Michael Swart" Birthdate="Nov. 22">
    <JobHistory JobDate="2000" Title="Lackey">
            <SalaryHistory SalaryDate="2000" Salary="$1,000,000" />
            <SalaryHistory SalaryDate="2001" Salary="$2,000,000" />
    </JobHistory>
    <JobHistory JobDate="2002" Title="Senior Lackey">
            <SalaryHistory SalaryDate="2002" Salary="$3,000,000" />
            <SalaryHistory SalaryDate="2003" Salary="$4,000,000" />
    </JobHistory>
    <Children ChildName="Mini-me 1" Birthyear="2000" />
    <Children ChildName="Mini-me 2" Birthyear="2002" />
</Employee>

XML Inside Databases

But in 2005, SQL Server introduced the xml data type. And my take is that it’s good… mostly. It’s as good as long as you treat as an atomic value (i.e. non-decomposable).

  • We can think of an xml column as just a string column with a constraint that makes it well formed xml. And that’s not terribly different than an integer column with a constraint that enforces that values are not negative.
  • XML columns are perfect for software messages (like SQL Server broker messages).
  • And they’re great for serialized objects (like query plans).

But …

XML is the most decomposable data type I can think of and when we treat these columns as non-atomic, then the tables that contain them are not in 1NF. It’s certainly not what E. F. Codd had in mind when he designed the relational data model.
An upset looking E. F. Codd
I’ve found that trouble comes when we try to query it or when we modify it in place. Because then we’re not treating the xml as atomic. And then the shortcomings of xml columns inside a database become very apparent. In SQL Server – as of 2005 – there are ways to query xml values or modify xml values such as when we use xml indexes and when we use XQuery. We find complicated code down this road and performance issues that are just as complicated.
For example:

What are your thoughts?

Do you have opinions on 1NF or the history of databases? It’s all fascinating stuff. Let me know in the comments what you think. I’ve got a bunch of ideas that can’t fit in one article and your comments might let me know what to continue writing about.

17 Comments

  1. Are you interested in giving a presentation titled “Ted Codd Hates That Thing You Just Did”, with pictures of grumpy Dr Codd throughout?

    Cause I’d pay money to see that one. 🙂 And I’m sure I’d learn a whole bunch of useful things, too.

    Comment by Kendra Little — January 19, 2011 @ 1:39 pm

  2. Oh, man that would be awesome.

    And if the world were made of me’s, there would be a market for a sitcom called “Ted Codd Hates That Thing You Just Did”. Maybe starring Wilford Brimley. I’d buy that whole series on DVD. 🙂

    Comment by Michael J. Swart — January 19, 2011 @ 1:46 pm

  3. Sometimes we have to store XML in the database because it’s the fastest way to return XML to a calling application. Shredding and building XML in SQL Server is computationally expensive. Storing it as XML is cheaper.

    There are a few reasons to store data in an XML data type:
    1) if we store it as NVARCHAR(MAX), some jerk might put text in that column and then things get messed up in the application
    2) the XML data type only allows valid XML
    3) we can index the XML so that we can query the random data from it
    4) it’s better than an entity attribute value table 🙂

    Comment by Jeremiah Peschka — January 20, 2011 @ 5:07 pm

  4. I think we’re on the same page Jeremiah.

    Every point you’re making is in-line with 1NF. The XML data type isn’t bad at all.
    I mean XML is a great data type being essentially a nvarchar(max) with a well-formed-xml constraint on it. And of course we wouldn’t want any jerks putting text in that column.

    But if we ever find ourselves modifying the xml values in place with XQuery. Or even modifying a set of XML values in place, then it gets pretty hairy.

    Your #3 is a kind of fuzzy area because it’s not treating the data as atomic. The definition of normalization doesn’t mention indexes at all, (being an ordering of data) but it certainly serves a business purpose. Who knows what Codd would say about that, but Michael J Swart gives it a thumbs up.

    In summary
    XML data type: Good!
    DB operations on XML data (like Xquery and shredding): <breathes in through teeth>

    Comment by Michael J. Swart — January 20, 2011 @ 5:53 pm

  5. HAHA! If you don’t like discussion about using XML, don’t attend my muddy data talk.

    I’m reading C. J. Date’s newest book and it’s interesting to read what he thinks of some of Mr. Codd’s ideas. One of Date’s more interesting points is that the reason to understand normalization is so that you know when not to normalize.

    Comment by Jeremiah Peschka — January 20, 2011 @ 6:10 pm

  6. Touche!

    In lieu of heckling you, I’m looking forward to Karen Lopez’s Database Design: Contentious Issues 🙂

    Comment by Michael J. Swart — January 20, 2011 @ 6:16 pm

  7. All this contention is so wonderful….

    Every design option has its place, given the right cost, benefit and risk. It’s only a problem when people use a solution for the wrong reasons, they are surprised when they don’t get the benefits they thought they were going to get.

    Thanks for the plug. I love doing that session. I hear people in CLE are especially contentious, so it should be good.

    Comment by Karen Lopez — January 20, 2011 @ 7:16 pm

  8. […] Ridiculously Unnormalized Database Schemas – Part One – Continuing his series on Normalization, it’s the second instalment from Michael J Swart, yeah I know it’s says Part One, go read the post fool. […]

    Pingback by SQL Server Blogs and news for the week 21/01/11 | John Sansom - SQL Server DBA in the UK — January 21, 2011 @ 6:48 am

  9. Hi,

    I was looking for such kind of blog for a long time which can explain normal form in clear and concise form.

    thanks man this is indeed very useful , waiting for next one.

    Javin
    mysql tutorial

    Comment by Javin @ Tibco RV Tutorial — January 22, 2011 @ 8:47 am

  10. Thanks Javin,
    I’m glad you like it. The Wikipedia pages explain the normal forms a bit better (or at least more thoroughly) than I do. But I’m having fun coming up with the examples.

    Comment by Michael J. Swart — January 22, 2011 @ 12:47 pm

  11. excellent series, thanks for publishing this

    Thomas

    Comment by Thomas LeBlanc — January 26, 2011 @ 12:34 pm

  12. Nested tables are part of standard SQL and some DBMSs do support them (Oracle for example). Relations are values and so there’s no fundamental reason why relations cannot be nested within relations. In principle therefore, nested relations are not a violation of 1NF which simply requires that each attribute consist of single values. Chris Date and others have argued convincingly that there is no sound reason for excluding certain types of value from the relational model – doing so would only weaken it rather than strengthen it.

    Comment by David — February 1, 2011 @ 4:14 am

  13. Thanks for stopping by David,

    I actually did not know that Oracle could store nested tables.

    But I knew this topic would come up. E. F. Codd’s definition of normal form was more of a process than an academic definition. And I knew Chris Date wrote about it and qualified that. That’s why I took my example straight from E. F. Codd’s paper. If that doesn’t violate 1NF then nothing does.

    But you’re right. “In principle therefore, nested relations are not a violation of 1NF” But all the same arguments for and against including XML values in SQL Server apply.

    Comment by Michael J. Swart — February 1, 2011 @ 8:24 am

  14. Well, here’s the fun part: SQL violates relational theory in a number of ways. NULLable datatypes are expressly disallowed. A relation variable contains an unordered set of tuples. Every tuple must have the same number of attributes and every attribute must have a value. A NULL isn’t a value, it’s an unknown (toaster != house cat). Therefore, any system that allows NULLs isn’t compliant with relational theory.

    Don’t let the implementation problems of SQL muddy the waters of relational theory. One is software, the other is math.

    Comment by Jeremiah Peschka — February 1, 2011 @ 8:30 am

  15. Hey Jeremiah,

    Thanks again for commenting (I think I will attend that muddy data talk of yours)

    Re: Software vs. Math
    That’s so true. I think I’m a pragmatic guy. And I’m 100% on board when you said it’s important to know these normal forms so that you know when not to normalize.

    It’s like chess, it’s a great rule of thumb to protect your King, control the center, develop the minor pieces and castle early. But knowing why they’re a rule of thumb lets you know when to break those rules.

    It’s just that following normal forms avoid so many problems. At work, I’ll come across 15 problems that could have been avoided with a normalized schema for every one problem that could have avoided with an unnormalized solution.

    Comment by Michael J. Swart — February 1, 2011 @ 9:22 am

  16. where are 2nd & 3rd normlization

    Comment by waqar — December 2, 2012 @ 8:30 am

  17. Hey waqar,
    There’s links at the top. I’m surprised you’re having trouble. Maybe I’m missing something.

    Comment by Michael J. Swart — December 2, 2012 @ 9:37 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress