Reputation: 55
I am trying to create a Fact Table using 4 dimensions and 2 Files and i haven't been able to connect all the information. I usually work with python but now i must use this with R and SQL. I know exactly how i should connect all the info, but i don't know how to create de SQL Queries to do it. So if anyone could help me I would be very grateful.
First of all, i'll describe the Dimensions and then Files. Next i'll explain what is my idea to connect everything and retrieve the rows for fact Table
Dimension Spectators: (2000 entries)
ID, Spectator Code, Region, Genre, age class
Dimension Hour: (1440 entries)
ID, Hour, Minute, Complete Hour(HH:MM:SS), Period of the day
Dimension Date: (365 entries)
ID, Year, Month, Day, Complete Date (YYYY:MM:DD), Day of the week
Dimension Programs: (60000 entries)
ID, Station, Name of Program, Start hour of the program (HH:MM:SS), Duration(seconds), Complete Date (YYYY:MM:DD)
After the dimensions I have 2 other files
Audiences: (2.2million entries)
Id (The same as Spectators File), Complete Date(YYYY:MM:DD), station, Duration of visualization(minutes), Start
of visualization(HH:MM:SS), End of visualization(HH:MM:SS)
-Spectators File: (300000 entries)
Id(The same as Audiences File), Spectator Code(The same as Spectators Dimension), region, genre, age class
Somehow, i would like to do the fact table (based on Audiences File) using this format for each row of the table:
-ID of the Spectators dimension || Interacting ID Audience with ID of Spectators File then Code from Spectators File with Spectators Dimension
-ID of the Date Dimension || Interacting Complete Date from Audiences with Complete Date from Date Dimension
-ID of the Hour Dimension - Start Visualization || Interacting Complete Hour from Audiences with Complete Hour from Hour Dimension
-ID of the Hour Dimension - End visualization || Interacting Complete Hour from Audiences with Complete Hour from Hour Dimension
-ID of the Program Dimension (Interacting Complete Day, Station and the Interval between Start and Snd of visualization from Audiences with Complete Day, Station and Interval created by Start of Program + Duration of Program Dimension) The visualization of a spectator must be between the duration of program to assume a Row in fact table
-Duration of visualization from Audiences
Upvotes: 0
Views: 904
Reputation: 427
When you say you have 4 dimensions and 2 files, may I assume that all are present as data frames on R? Or are the 4 dimensions some external RDBMS tables that you need to connect using R? (For some reason, I cant comment on your question and hence writing it in answers here). If all data frames are present in R environment, then you can use a merge to merge the files, or SQLDF package to write SQL statements.
Upvotes: 0