PythagorasDoe
PythagorasDoe

Reputation: 55

Data Warehouse - How to create a Fact Table interecting all dimensions with R and/or SQL

I am trying to create a Fact Table using 4 dimensions and 2 Files and i haven't been able to connect all the information. I usually work with python but now i must use this with R and SQL. I know exactly how i should connect all the info, but i don't know how to create de SQL Queries to do it. So if anyone could help me I would be very grateful.

First of all, i'll describe the Dimensions and then Files. Next i'll explain what is my idea to connect everything and retrieve the rows for fact Table

4 DIMENSIONS

  1. Dimension Spectators: (2000 entries)

    ID, Spectator Code, Region, Genre, age class
    
  2. Dimension Hour: (1440 entries)

    ID, Hour, Minute, Complete Hour(HH:MM:SS), Period of the day
    
  3. Dimension Date: (365 entries)

    ID, Year, Month, Day, Complete Date (YYYY:MM:DD), Day of the week
    
  4. Dimension Programs: (60000 entries)

    ID, Station, Name of Program, Start hour of the program (HH:MM:SS), Duration(seconds), Complete Date (YYYY:MM:DD)
    

2 FILES

After the dimensions I have 2 other files

  1. Audiences: (2.2million entries)

    Id (The same as Spectators File), Complete Date(YYYY:MM:DD), station, Duration of visualization(minutes), Start
    

    of visualization(HH:MM:SS), End of visualization(HH:MM:SS)

  2. -Spectators File: (300000 entries)

    Id(The same as Audiences File), Spectator Code(The same as Spectators Dimension), region, genre, age class
    

Somehow, i would like to do the fact table (based on Audiences File) using this format for each row of the table:

-ID of the Spectators dimension || Interacting ID Audience with ID of Spectators File then Code from Spectators File with Spectators Dimension

-ID of the Date Dimension || Interacting Complete Date from Audiences with Complete Date from Date Dimension

-ID of the Hour Dimension - Start Visualization || Interacting Complete Hour from Audiences with Complete Hour from Hour Dimension

-ID of the Hour Dimension - End visualization || Interacting Complete Hour from Audiences with Complete Hour from Hour Dimension

-ID of the Program Dimension (Interacting Complete Day, Station and the Interval between Start and Snd of visualization from Audiences with Complete Day, Station and Interval created by Start of Program + Duration of Program Dimension) The visualization of a spectator must be between the duration of program to assume a Row in fact table

-Duration of visualization from Audiences



Upvotes: 0

Views: 904

Answers (1)

Varun kadekar
Varun kadekar

Reputation: 427

When you say you have 4 dimensions and 2 files, may I assume that all are present as data frames on R? Or are the 4 dimensions some external RDBMS tables that you need to connect using R? (For some reason, I cant comment on your question and hence writing it in answers here). If all data frames are present in R environment, then you can use a merge to merge the files, or SQLDF package to write SQL statements.

Upvotes: 0

Related Questions