Reputation: 51
I have a json with 140 of these elements ('activities') and I need to make a python program to transform it to this ('user_sessions'). So now instead of being grouped by the activity id and other information, it is now grouped by 'user_id' with certain conditions:
My question is, how can I group by user id and check all the data within the same id to make it meet the conditions above?
I used a lambda function to accommodate user_id data['activities'].sort(key = lambda x: x ['user_id'])
but literally just sort it by user_id and I need to group it by user_id.
This is the info of the json, 'activities' is how it's currently sorted and 'user_sessions' how I need it to be.
{"activities":
[
{
"id": 198891,
"user_id": "emr5zqid",
"answered_at": "2021-09-13T02:38:34.117-04:00",
"first_seen_at": "2021-09-13T02:38:16.117-04:00"
},
{
"user_sessions": {
"3pyg3scx": [
{
"ended_at": "2021-09-10T19:51:26.799-04:00",
"started_at": "2021-09-10T19:22:23.799-04:00",
"activity_ids": [
251953,
379044
],
"duration_seconds": 173.0
},
{
"ended_at": "2021-09-11T04:33:50.799-04:00",
"started_at": "2021-09-11T04:05:20.799-04:00",
"activity_ids": [
296400,
247727,
461955
],
"duration_seconds": 171.3
}
]
And this is my code but I actually do not have nothing to show about what I asked.
import json
import datetime
#Leemos el json
with open('/Users/kenyacastellanos/Downloads/data.json') as json_data_file:
data = json.load(json_data_file)
#print(data)
# Realizamos el ordenamiento por llave, la llave es user_id, creamos una funcion lambda para el ordenamiento
data['activities'].sort(key = lambda x: x['user_id'])
for x in range(len(data['activities'])):
# Duration
date1 = datetime.datetime.fromisoformat(data['activities'][x]['answered_at'])
date2 = datetime.datetime.fromisoformat(data['activities'][x]['first_seen_at'])
difference_date = (date1-date2)
print("Duration in seconds:", difference_date.seconds, difference_date.microseconds)
Upvotes: 0
Views: 82
Reputation: 51
Okey, so I did this.
user_sessions.append((x['user_id'], x['id'], difference_date))
print("User sessions: ", user_sessions)
for group in itertools.groupby(user_sessions, key=lambda x: x[0]):
print(group[0], end=" -> Duration in secs: ")
tot = datetime.timedelta(seconds=0)
for session in group[1]:
tot += session[2]
if tot <= datetime.timedelta(seconds=300):
print(tot.days*86400 + tot.seconds)
First, I append the keys I wanted to work with, then the print to make sure it was as I wanted and then with itertools I was able to sort them by user_id that's what I wanted, also, I calculated the total duration of the session and not just the duration of an activity (which is what I had before).
Upvotes: 1