Xitcod13
Xitcod13

Reputation: 6079

storing email as three attributes versus one (for statistical purposes)

I have a database that has a table email_eml that stores 3 attributes name_eml, host_eml and domain_eml. Which store email name the name of website and the domain name (like .com .net etc) It doesnt store @ or a . in any of the variables. This allows me some flexibility (for example checking average name lenght(before the @ symbol) will be faster) . I can collect some statistics on email name, I can also create usernames from the name_eml attribute. It however is also a burden to handle when people are submitting their email or i have to compare a whole email. This will make me store the additional @ and . symbols and make me seperate the name through script when i want to collect statistics.

I wonder if its better to store the email in a single column instead of the 3 columns. Is one of the ways more proper or more normalized way?

I would like the answer to include pros and cons of both approaches to storing the email adresses. (even if storing the emails in 3 columns doesnt have many pros)

Upvotes: 0

Views: 114

Answers (3)

Chris Trahey
Chris Trahey

Reputation: 18290

In terms of normalization, once you break apart common aspects (such as host and especially top-level domain), they should be modeled as foreign relationships. So you end up with three tables:

  1. emailNames
  2. emailHosts
  3. emailTLDs

emailNames then has three columns:

  1. emailName
  2. hostID
  3. tldID

Note that I used "TLD", as this is likely the only part with significant overlap in the host name, and you can expect the "." character in hostnames before the start of the TLD.

Upvotes: 0

It doesnt store @ or a . in any of the variables.

Well, it should; [email protected] is a legal email address.

I wonder if its better to store the email in a single column instead of the 3 columns. Is one of the ways more proper or more normalized way?

This doesn't have anything to do with normalization. It has to do with complex data types.

The relational model allows arbitrarily complex data types. A commonly used complex data type is a timestamp, which typically includes year, month, day, hour, minute, second, and microsecond.

Given a timestamp, sometimes you might need to know only the date, and sometimes you might need to know only the year or only the hour. The relational model imposes a specific burden on the dbms when dealing with complex data types. For a complex data type, the dbms is required either to return it in its entirety, or to provide functions that return its various parts. The point is that, if a user wants only the hour out of a timestamp, the user doesn't write code to get it.

SQL dbms have good support for timestamps; every dbms that I'm familiar with provides functions that return various parts of timestamps. None of them have native support for email addresses.

On a SQL platform, you have at least two alternatives to keep your database close to the relational model. You can write functions that can be incorporated into the database server (if your dbms and your programming skill allows that), or you can split up the data type into pieces so each can be addressed in its entirety like any other value.

While there are probably some data types that make sense to split like that (street addresses might be one of them), I don't really see any compelling reason to split an email address.

This allows me some flexibility (for example checking average name lenght(before the @ symbol) will be faster) . I can collect some statistics on email name, I can also create usernames from the name_eml attribute.

While that's true, right now I can't imagine anything at all interesting about the average length of a username. I don't find any of your reasons compelling, but you know more about your application than I do.

If you really need to do a lot of operations on the pieces, it makes more sense to keep the pieces separate. More "normal" client code should access the email addresses through a view that concatenates the pieces. (Concatenation is a lot easier than parsing an email address at run time.)

Upvotes: 2

philwilks
philwilks

Reputation: 669

It's extremely rare to store email addresses in three columns. If you want to do something like search on the part of the email before the @ symbol you could just use a LIKE query...

SELECT email FROM people WHERE email LIKE 'john.smith@%';

I'd be interested to hear of any real-life examples that aren't possible to do with an SQL query.

Upvotes: 0

Related Questions