Reputation: 119
I have a pandas dataframe containing windows 10 logs. I want this pandas df to convert to JSON. What is an efficient way to do this?
I already made it to generate a default pandas df, however this is not nested. How I want it
{
"0": {
"ProcessName": "Firefox",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"1": {
"ProcessName": "Excel",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"2": {
"ProcessName": "Word",
"time": "2019-07-12T01:30:00",
"timeFloat": 1562888000.0,
"internal_time": 1.5533333333,
"counter": 0
}
I want it to look like like this
{
"0": {
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"Processes" : {
"Firefox" : 0 # ("counter" value),
"Excel" : 0
},
"1": ...
}
Upvotes: 1
Views: 1202
Reputation: 16660
It seems to me that you want to create JSON from an aggregated data based on ['time', 'timeFloat', 'internal_time']
which you can get doing:
pd.groupby(['time', 'timeFloat', 'internal_time'])
However, your example suggests that you want to maintain the index key ("0", "1"
, etc.) which is contrary to the previously stated intention.
The aggregated values from one time point:
"Firefox" : 0
"Excel" : 0
seem like correspond to these index keys which will be lost when you do the aggregation.
However, if you decided using aggregation the code would look something like this:
# reading in data:
import pandas as pd
import json
json_data = {
"0": {
"ProcessName": "Firefox",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"1": {
"ProcessName": "Excel",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"2": {
"ProcessName": "Word",
"time": "2019-07-12T01:30:00",
"timeFloat": 1562888000.0,
"internal_time": 1.5533333333,
"counter": 0
}}
df = pd.DataFrame.from_dict(json_data)
df = df.T
df.set_index(["ProcessName", 'time', 'timeFloat', 'internal_time', 'counter'])
# processing:
ddf = df.groupby(['time', 'timeFloat', 'internal_time'], as_index=False).agg(lambda x: list(x))
ddf['Processes'] = ddf.apply(lambda r: dict(zip(r['ProcessName'], r['counter'])), axis=1)
ddf = ddf.drop(['ProcessName', 'counter'], axis=1).
# printing the result:
json2 = json.loads(ddf.to_json(orient="records"))
print(json.dumps(json2, indent=4, sort_keys=True))
Result:
[
{
"Processes": {
"Excel": 0,
"Firefox": 0
},
"internal_time": 0.0,
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0
},
{
"Processes": {
"Word": 0
},
"internal_time": 1.5533333333,
"time": "2019-07-12T01:30:00",
"timeFloat": 1562888000.0
}
]
Upvotes: 2
Reputation: 504
As I understand you need group objects by "time" and merge counters from different processes. If yes - here is an example of implementation:
input_data = {
"0": {
"ProcessName": "Firefox",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"2": {
"ProcessName": "ZXC",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"3": {
"ProcessName": "QWE",
"time": "else_time",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
}
}
def group_input_data_by_time(dict_data):
time_data = {}
for value_dict in dict_data.values():
counter = value_dict["counter"]
process_name = value_dict["ProcessName"]
time_ = value_dict["time"]
common_data = {
"time": time_,
"timeFloat": value_dict["timeFloat"],
"internal_time": value_dict["internal_time"],
}
common_data = time_data.setdefault(time_, common_data)
processes = common_data.setdefault("Processes", {})
processes[process_name] = counter
# if required to change keys from time to enumerated
result_dict = {}
for ind, value in enumerate(time_data.values()):
result_dict[str(ind)] = value
return result_dict
print(group_input_data_by_time(input_data))
Result is:
{
"0": {
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"Processes": {
"Firefox": 0,
"ZXC": 0
}
},
"1": {
"time": "else_time",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"Processes": {
"QWE": 0
}
}
}
Upvotes: 1