Reputation: 21
I'm new to this Big data world. As a course project, I'm working on sensor networks and want to store sensor data on HBase. Currently data are stored in MySQL database. I'm trying to load this data into HBase. But data are growing so fast and querying on that is getting very slow.
Here is the MYSQL table schema: SensorLog(sensorID, userID,time,date).
So this tables saves a sensor firing logs. For each user (45 users in total) there are 25 motion sensors in his apartment. Every time a user moves in his apartment a sensor will fire, and this event will be logged into this table. The main question is what sensors fired for a specific user in a specific time interval and day.
I came up with threeHBase schema, and I'm just wondering to ask your opinion about them. In these schema, I present time as seconds in a day, i.e. an integer number in the range of 0-86400.
Schema1: Rowkey: Date; Column-Family: Time { cq:(t0-t86400); cv:(userID,sensorID)}
Schema2: Rowkey: (Date,userID); Column-Family: Time { cq:(t0-t86400); cv:(sensorID)}
Schema3: Rowkey: (Date,userID); Column-Family: Time { cq:(s1-s25); cv:(time)}
Would you please let me know which schema is better and more efficient? I appreciate any help in advance.
Upvotes: 0
Views: 963
Reputation: 25939
45 people and 25 sensors hardly seems like something you'd want to store in HBase.
If you're keen on using HBase anyway, than a key design should be driven by your read and write patterns. for instance assuming each user only gets a few measurements a second and the number of users affects the load a composite key of row key userId, timestamp and sensor Id seems to make sense where the value would be the reading
Lastly , you may want to look at OpenTSDB which is open source, builds on HBase and was built to store time series measurement at scale. You can see its schema here
Upvotes: 0