Reputation: 1349
Need help in writing a splunk query that can help me measure the stats correctly. Please note the data is sometimes sent to splunk more than once sometimes due to an error on our server
Here is an overview . I am interested in knowing for how long a user has been using my application. An event will have user_id, connection state ( values are connected, paused and terminated ), session_length, timestamp. The usage of application for a user can be retrieved from field session_length.
When the user is either in paused or terminated state, it would contain session length value. If the user state contains connected, paused and terminated for a sequence , then the session length should be derived from terminated state, else the session length should be computed from paused state.
Unfortunately the paused and terminated data is sent more than once to the server. I would like to filter such data.
The sample data is written below.
Consideration --> If for a sequence of events for that user , the terminated/paused state is reported twice, then use the oldest session length for that sequence.
Sample Data
User1 Terminated session_length=13 timestamp=10.13.
User1 Terminated session_length=11 timestamp=10.11.
User1 Paused session_length=10 timestamp=10.10
User1 Connected timestamp=10.00
--
User1 Paused session_length=5 timestamp=9.05
User1 Connected timestamp=9.00
--
User2 Terminated session_length=13 timestamp=10.13.
User2 Terminated session_length=11 timestamp=10.11.
User2 Paused session_length=10 timestamp=10.10
User2 Connected timestamp=10.00
--
User1 Terminated session_length=6 timestamp=9.06
User1 Connected timestamp=8.00
--
For the sample data set above, I have used the delimiter -- to filter the sequences and explain how the duration will be calculated. 1) For the User1 ( from top to bottom ), consider the first sequence . User state is connected, paused, terminated,Terminated. So its session length 11 ( consider the earliest reported session length ). 2) For User 1 , sequence 2 , the user state is connected, paused. So its session length - 5 3) For user 2 , sequence 2, the User2 states are connected, paused, terminated, terminated. Its session length - 11. 4) For User 1 , sequence 4 , the user state is connected, terminated. Its session length is 6
The splunk query should report , a sum of session lengths per user which is
User_Id TotalSessionLength User1 22 User2 11
Upvotes: 1
Views: 1823
Reputation: 2475
The most direct solution begins with the transaction command:
index=my_user_events_index
| transaction user_id startswith="connection_state=Connected" unifyends=true
This creates a single event for each session, as you grouped them above. Since transaction
groups all values of the same field name into a single multivalued field, we won't be able to tell which session_length goes with which connection_state. To handle this, we rename the session_length field according to the connection_state before we hit the transaction:
index=my_user_events_index
| eval paused_session_length=case(connection_state=="Paused",session_length)
| eval termination_session_length=case(connection_state=="Terminated",session_length)
| transaction user_id startswith="connection_state=Connected" unifyends=true
Almost there! Now we have events that represent a session, each with a multivalued paused_session_length and a multivalued terminated_session_length. We need to apply our business logic using some creative multivalue evaluation functions:
index=my_user_events_index
| eval paused_session_length=case(connection_state=="Paused",session_length)
| eval termination_session_length=case(connection_state=="Terminated",session_length)
| transaction user_id startswith="connection_state=Connected" unifyends=true
| eval session_length=coalesce(mvindex(mvsort(terminated_session_length),0),mvindex(mvsort(paused_session_length),0))
From here, we need only a simple stats command to sum the session lengths by user:
index=my_user_events_index
| eval paused_session_length=case(connection_state=="Paused",session_length)
| eval termination_session_length=case(connection_state=="Terminated",session_length)
| transaction user_id startswith="connection_state=Connected" unifyends=true
| eval session_length=coalesce(mvindex(mvsort(terminated_session_length),0),mvindex(mvsort(paused_session_length),0))
| stats sum(session_length) as TotalSessionLength by user_id
Transactions can be fairly slow with a lot of data, and the multivalued fields are a headache. Another alternative is to use streamstats by user_id to carry forward the timestamp of the most recent Connected event, then use stats to aggregate the session and compute the session length. I'll leave that as an exercise for the reader (or another SO question) :)
Upvotes: 2