Oscar Wahltinez
Oscar Wahltinez

Reputation: 1195

Database redesign using sqlite3

I am afraid that this question might be a little too broad, but I have little-to-none knowledge when it comes to database design, so I will explain what I have so far before I get to the question.

I have an excel file, which I exported into a csv file, which has the following (simplified) layout:

Region, License Number in 1st Year, License Number in 2nd Year, License Number in 3rd Year...

The idea is that the licenses are incremented and at the end of every year, I have a record of what is the highest license at the time for each region. Because there are almost as many regions as there are years, it makes sense the previous layout for an excel file because we have the data displayed in what is close to be a square, but for the database I believe that it could be easier to deal with it if I reduce the number of columns and I would have this:

Region, Year, License Number

Assuming that this is right (again, I am clueless when it comes to database design) my question is the following: how can I possibly convert all the data to the suggested format without having to manually create the (many) entries for each region and copy all the licenses into the corresponding years?

The goal is to be able to tell what year a license was issued by simply looking at the license number

Upvotes: 0

Views: 117

Answers (2)

Looks like you're on the right track. You'll want a primary key on {region, year}.

If you're relatively new to databases, the procedure below will be the easiest both for you to understand and for you to get right the first try.

One problem is that you don't appear to be storing the years. If 1st year means the same year for every value in that Excel column--that is, 1st year doesn't mean 2011 for one row, and 1999 for another--then you might be able to massage the data into shape easily just using a second worksheet.

  • Copy the two columns "Regions" and "License number in 1st year" to another worksheet.
  • If 1st year means, say, 2005, then insert a column between those two, copy the number 2005 all the way down the second column, and you're done. (Done with 1st year, that is.)
  • Repeat for each year.
  • Export the results, and import them into your table.

If you think you're comfortable running queries, you can try this procedure.

  • Create a table for your data. I'm guessing at the data types.

    create table your_table_name ( region varchar(35) not null, year integer not null check (year >= 1955), license_num integer not null check (license_num >= 0) );

  • Export your spreadsheet to CSV file. Consider changing the column names to the actual years.

  • Import into sqlite.

Insert data into your_table_name.

insert into your_table_name (region, year, license_num)
select region_column_name, 1955, license_num_column_for_1955
from imported_table_name

Note carefully that you'll need to adjust the literal year and the name of the column for that year each time you run the query.

Upvotes: 1

Randy
Randy

Reputation: 16677

well.. i'm not real clear on License Number - i think you mean a total (int) count of the number of licenses for the region in that year.

you probably have more info about each region as well.

i assume also that you do not have each individual license, but instead you are just maintaining the count. (if you want to store each license then this answer would be different)

so you will need a region table - something like:

region
-------
region_id
name
other_info_possibly

then either a license table to store each license - like this

license
---------
license_id
region_id
license_issue_date
license_value
other_info_possibly

or a summary maybe like this:

license_summary
----------------
region_id
license_count
year

next, you need to decide where and when to manipulate your original data: before or after importing to the database.

it might be easier after... in this case, you need a place to dump it in 'raw' maybe :

raw_data
---------
region
year1_count
year2_count
-- etc

then you can write inserts from this raw_data table to populate the other tables.

Upvotes: 0

Related Questions