m.yagmur
m.yagmur

Reputation: 41

Aggregation on distinct values after substring() is used

I'm working with a log collector app that collects app usage logs from multiple computers. The logs contain fields including "app" which is the path to an app and "seconds" which stores how long it was used.

My goal is to get the top 10 most used apps. Since I store app paths and not just app names, I use a substring function to extract app names. I use the following sql query:

SELECT substring(app from '[^\\\/:*?"<>|\r\n]+$') as app,
sum(seconds) as sum
FROM tbl_logs_app_focus
GROUP BY app
ORDER BY sum DESC
NULLS LAST
LIMIT 10

This is an example result with this query:

**"app"                        "sum"**
"explorer.exe"                 551404.9595867295
"OneDrive.exe"                 510425.765499115
"putty.exe"                    272261.3288867716
"chrome.exe"                   160638.84143324356
"MobaXterm_Personal_22.1.exe"  92442.94607090577
"Code.exe"                     88504.6548490785
"MobaXterm_Personal_22.1.exe"  21666.859371185303
"chrome.exe"                   8569.392006158829
"vmware-vmrc.exe"              7976.217761993408
"pgAdmin4.exe"                 6493.827002130449

As you can notice, there are duplicates like "MobaXterm_Personal_22.1.exe". The reason why this happens is that the app names were extracted from logs of two different computers. And since the usernames of those computers were different, even though the app names were the same, the paths were different. Like this:

"C:\Users\<Example User 1>\AppData\Local\Temp\Temp1_MobaXterm_Portable_v22.1.zip\MobaXterm_Personal_22.1.exe"
"C:\Users\<Example User 2>\AppData\Local\Temp\Temp1_MobaXterm_Portable_v22.1.zip\MobaXterm_Personal_22.1.exe"

The current query displays the paths as file names but sums the seconds of distinct paths instead of distinct file names. How can I modify the query so it sums the distinct file names and doesn't cause duplicates?

Upvotes: 0

Views: 37

Answers (1)

m.yagmur
m.yagmur

Reputation: 41

Turns out the problem was that the alias and the actual column name were the same but the group_by function was taking the column name. I changed the alias to app_name and used the group_by function with that which solved the problem.

Here's the correct query:

SELECT substring(app from '[^\\\/:*?"<>|\r\n]+$') as app_name,
sum(seconds) as sum
FROM tbl_logs_app_focus
GROUP BY app_name
ORDER BY sum DESC
NULLS LAST
LIMIT 10

Upvotes: 1

Related Questions