Reputation: 353
I need to create a table in MySQL version 5.5
this table will have information like:
Here's what i think:
create table statistics (
browser varchar(255) not null,
version float not null,
ip varchar(40) not null,
dateandtime datetime,
referrer varchar(255)
);
I read on mysql.com that I need to use indexes to make my query fast but now my problem is what index should I create in order to make that table fast to query?
I need to query all the fields eg:
Thanks
Upvotes: 9
Views: 1080
Reputation: 2485
BROWSER AND OS STATISTICS FROM USER-AGENT STRINGS IN MYSQL
--------------------------------------------------------------
Let’s say you have a MySQL table called ‘loginhistory’ that contains ‘userid’ and ‘useragent’. To make a count of how many times certain OSes occurred in the user-agent strings, I used the following MySQL query:
SELECT OS, COUNT(OS) AS freq FROM
(SELECT
CASE
WHEN useragent LIKE '%iPad%' THEN 'iPad'
WHEN useragent LIKE '%iPhone%' THEN 'iPhone'
WHEN useragent LIKE '%Android%' THEN 'Android'
WHEN useragent LIKE '%Mac OS X%' THEN 'OS X'
WHEN useragent LIKE '%X11%' THEN 'Linux'
WHEN useragent LIKE '%Windows NT 6.3%' THEN 'Windows 8.1'
WHEN useragent LIKE '%Windows NT 6.2%' THEN 'Windows 8'
WHEN useragent LIKE '%Windows NT 6.1%' THEN 'Windows 7'
WHEN useragent LIKE '%Windows NT 6.0%' THEN 'Windows Vista'
WHEN useragent LIKE '%Windows NT 5.2%' THEN 'Windows Server 2003; Windows XP x64 Edition'
WHEN useragent LIKE '%Windows NT 5.1%' THEN 'Windows XP'
WHEN useragent LIKE '%Windows NT 5.0%' THEN 'Windows 2000'
WHEN useragent LIKE '%Windows NT 4.0%' THEN 'Microsoft Windows NT 4.0'
WHEN useragent LIKE '%Windows 9' THEN 'Windows 95/98/Millenium'
WHEN useragent LIKE '%Windows CE' THEN 'Windows CE'
ELSE 'Other'
END OS
FROM loginhistory) AS osses
GROUP BY OS
ORDER BY freq DESC
Through the use of CASE, WHEN, THEN, the user-agent string is searched for certain elements and translated to a friendly OS name. The outer query then groups these newly created OS names and counts the frequency of every OS, outputting something like this:
+---------------+------+
| OS | freq |
+---------------+------+
| Windows 7 | 173 |
| Windows 8.1 | 152 |
| iPad | 63 |
| Windows Vista | 13 |
| OS X | 10 |
| iPhone | 8 |
| Android | 7 |
+---------------+------+
7 rows in set (0.00 sec)
The same can be done to calculate the frequency of browsers in all user-agent strings:
SELECT browser, COUNT(browser) AS freq FROM
(SELECT
CASE
WHEN useragent LIKE '%Chrome%' THEN 'Chrome'
WHEN useragent LIKE '%Safari%' THEN 'Safari'
WHEN useragent LIKE '%Firefox%' THEN 'Firefox'
WHEN useragent LIKE '%MSIE 7%' THEN 'IE7'
WHEN useragent LIKE '%MSIE 8%' THEN 'IE8'
WHEN useragent LIKE '%MSIE 9%' THEN 'IE9'
WHEN useragent LIKE '%MSIE 10%' THEN 'IE10'
WHEN useragent LIKE '%rv:11%' THEN 'IE11'
ELSE 'Other'
END browser
FROM loginhistory) AS browsers
GROUP BY browser
ORDER BY freq DESC
Which would output something like this:
+---------+------+
| browser | freq |
+---------+------+
| IE7 | 128 |
| IE11 | 119 |
| Chrome | 83 |
| Safari | 38 |
| Firefox | 7 |
| IE10 | 4 |
+---------+------+
6 rows in set (0.00 sec)
This data could then be dumped straight into a library like Chart.js which will automatically make a pie chart from the frequency data. Or you could calculate percentages yourself relative to the sum of all frequencies.
If you also have a date or timestamp column next to every user-agent string you could add a WHERE clause to, for example, only show the statistics of OSes and browsers used in the last six months.
Upvotes: 0
Reputation: 510
I would recommend this:
Use intergers instead of chars/varchars. this way you index faster (except the referrer). Also, I can recommend to get summary tables. Although it's not really normalized but the query will be executed instantly - specially if you have a big organization with lots of traffic.
So here's the tables:
create table statistics (
browser tinyint(3) UNSIGNED not null default 0,
version float(4,2) not null default 0,
ip INT(10) UNSIGNED not null default 0,
createdon datetime,
referrer varchar(5000),
key browserdate (browser, createdon),
key ipdate (ip, createdon),
// etc..
);
browser 0 = unknow, 1 = firefox etc.. This can be done in your code (so you load the same code for inserting and selecting). i dont use enum here because if you need to alter the table and you have millions of records this can be painful. new browser = new number in the code which is way faster to change.
this table can be used to resummarized all the other tables if something happens. so you create an index for the inline summary table (example browser)
Now the summary table:
create table statistics_browser_2011_11 (
browser tinyint(3) UNSIGNED not null default 0,
version float(4,2) not null default 0,
number bigint(20) not null default 0,
createdon datetime,
unique key browserinfo (createdon, browser, version)
); // browsers stats for november 2011
This way when you inserts (you get the date of the user when he accessed the site and create a $string that match with the table name) into this table you only have to use the on duplicate key number = number +1
. this way when you retrieve the browser statistics is super fast.
now here you will have to create a merge table because if you are the second of the month and you want to query the last 7 days, you will need the current month and the last month table. here's more info: http://dev.mysql.com/doc/refman/5.1/en/merge-storage-engine.html
and you repeat the process for the other information: ip, referrer etc...
in order to maintain these tables, you will have to create a cronjob that creates tables for the next month. simple PHP script that gets the current year/month and then create the table for the next month if it does not exists and then merge them)
this might be a little of work but this is how i do it at work (with similar data) with 12 terabytes of data and 5,000 employees that fetch the databases. my average load time for each query is approx 0.60 seconds per requests.
Upvotes: 12
Reputation: 47321
I think your schema can be improved to
create table statistics
(
browser enum('Firefox','IE','Opera','Chrome','Safari','Others') not null
default 'Others',
// major browser family only
// instead of using free-form of varchar
user_agent text,
// to store the complete user agents
// mainly for reference purpose only
version float not null,
ip varchar(40) not null,
dateandtime datetime not null,
referer varchar(2000)
// 255 is no sufficient for referer
);
Index key
datetime, browser
query 1
select browser, count(*) from statistics
where dateandtime between ? and ?
group by browser;
query 2
select count(*) from statistics
where dateandtime between ? and ?;
query 3
select referer from statistics
where dateandtime between ? and ?;
Upvotes: 1