Reputation:
I have a set of AIS data in a single table. The data is based on a ships position, as in latitude and longitude together with a lot of static information such as an identifier, length, width, name and such But to increase efficiency and decrease redundancy I need to split this table into multiple (5) tables (in form of a star schema).
My question is what is the best strategy to go on about this? My ideas are:
Example of the data with limited columns:
The schema of the database shows the star schema on the left and the raw data on the right.
Upvotes: 1
Views: 256
Reputation: 19653
Forgive me if I am oversimplifying your use case, but why don't you just 1) create the dimension tables using distinct queries(generating ids in case there isn't any), 2) create the fact table with the proper constraints and 3) populate the fact table based on the original and the dimension tables with the new ids, for instance:
Data Sample:
CREATE TEMPORARY TABLE ais_data (
ship_name TEXT,
lat NUMERIC,
lon NUMERIC,
tmstp TIMESTAMP
);
INSERT INTO ais_data VALUES
('foo',1,2,'2021-02-19 12:10:25'),
('foo',1,3,'2021-02-19 12:20:25'),
('foo',1,4,'2021-02-19 12:30:25'),
('bar',1,2,'2021-02-19 12:10:25'),
('bar',1,4,'2021-02-19 12:30:25');
Creating dimensions
CREATE TEMPORARY TABLE ship_dim (
ship_id SERIAL PRIMARY KEY,
ship_name TEXT
);
INSERT INTO ship_dim (ship_name)
SELECT DISTINCT ship_name FROM ais_data;
CREATE TEMPORARY TABLE position_dim (
position_id SERIAL PRIMARY KEY,
lon NUMERIC, lat NUMERIC,
geom GEOMETRY(POINT,4326),
tmstp TIMESTAMP
);
INSERT INTO position_dim (lon,lat,tmstp,geom)
SELECT DISTINCT lon,lat,tmstp,ST_MakePoint(lon,lat)
FROM ais_data;
Creating fact table ...
CREATE TEMPORARY TABLE data_point_fact (
ship_id INT REFERENCES ship_dim(ship_id),
position_id INT REFERENCES position_dim(position_id)
);
... and populating it
INSERT INTO data_point_fact
SELECT s.ship_id,p.position_id FROM ais_data a
JOIN ship_dim s ON s.ship_name = a.ship_name
JOIN position_dim p ON p.lon = a.lon AND p.lat = a.lat AND p.tmstp = a.tmstp;
SELECT * FROM data_point_fact;
ship_id | position_id
---------+-------------
1 | 3
1 | 1
2 | 3
2 | 2
2 | 1
See this db<>fiddle
(without geometry)
Upvotes: 1