Reputation: 2417
The use case of my project is to display each author's commit with the size of each commit per day. This is how I need to represent my data
For this what I did is
timed_commits = commit_data.set_index('Date')
grouped = timed_commits.groupby(by=["Author"])
resampled = grouped.resample("D").agg(
{"SHA": "size", "Insertion": "sum", "Deletion": "sum"}
) # get the total count of commits in a day with total insertion and deletion
this gave me the output as the below one
Here Author and Date are the index while SHA, Insertion and Deletion are the columns. The reason why Author and Date are index is I want to know the per day commits of each author while I want the size of each commit(through insertion) as well
For such object I could not able to format in this way or any other way(better to have field name for author value and date value) which would support to display in the table as in the image I attached for it
{
'author1': {
'2017-10-18': {'SHA': 1, 'Insertion': 1.0, 'Deletion': 3.0},
'2017-10-19': {'SHA': 2, 'Insertion': 1.0, 'Deletion': 3.0},
'2017-10-20': {'SHA': 6, 'Insertion': 1.0, 'Deletion': 3.0},
'2017-10-21': {'SHA': 9, 'Insertion': 1.0, 'Deletion': 3.0},
},
'author2': {
'2017-10-18': {'SHA': 3, 'Insertion': 8.0, 'Deletion': 3.0},
'2017-10-19': {'SHA': 19, 'Insertion': 10.0, 'Deletion': 3.0},
'2017-10-20': {'SHA': 23, 'Insertion': 1.0, 'Deletion': 3.0},
'2017-10-21': {'SHA': 44, 'Insertion': 1.0, 'Deletion': 3.0},
}
}
I played with to_dict but did not make it workable.
this is the dataframe(here commit hash i.e sha is repeated because of the number of files changed in that particular commit). This is taken from git logs.
SHA Timestamp Date Author Insertion Deletion Churn File path
1 cae635054 Sat Jun 26 14:51:23 2021 -0400 2021-06-26 18:51:23+00:00 Andrew Clark 31.0 0.0 31.0 packages/react-reconciler/src/__tests__/ReactI...
2 cae635054 Sat Jun 26 14:51:23 2021 -0400 2021-06-26 18:51:23+00:00 Andrew Clark 1.0 1.0 0.0 packages/react-test-renderer/src/ReactTestRend...
3 cae635054 Sat Jun 26 14:51:23 2021 -0400 2021-06-26 18:51:23+00:00 Andrew Clark 24.0 14.0 10.0 packages/react/src/ReactAct.js
5 e2453e200 Fri Jun 25 15:39:46 2021 -0400 2021-06-25 19:39:46+00:00 Andrew Clark 50.0 0.0 50.0 packages/react-reconciler/src/__tests__/ReactI...
7 73ffce1b6 Thu Jun 24 22:42:44 2021 -0400 2021-06-25 02:42:44+00:00 Brian Vaughn 4.0 5.0 -1.0 packages/react-devtools-shared/src/__tests__/F...
8 73ffce1b6 Thu Jun 24 22:42:44 2021 -0400 2021-06-25 02:42:44+00:00 Brian Vaughn 4.0 4.0 0.0 packages/react-devtools-shared/src/__tests__/c...
9 73ffce1b6 Thu Jun 24 22:42:44 2021 -0400 2021-06-25 02:42:44+00:00 Brian Vaughn 12.0 12.0 0.0 packages/react-devtools-shared/src/__tests__/c...
10 73ffce1b6 Thu Jun 24 22:42:44 2021 -0400 2021-06-25 02:42:44+00:00 Brian Vaughn 7.0 6.0 1.0 packages/react-devtools-shared/src/__tests__/e...
11 73ffce1b6 Thu Jun 24 22:42:44 2021 -0400 2021-06-25 02:42:44+00:00 Brian Vaughn 47.0 42.0 5.0 packages/react-devtools-shared/src/__tests__/i...
12 73ffce1b6 Thu Jun 24 22:42:44 2021 -0400 2021-06-25 02:42:44+00:00 Brian Vaughn 7.0 6.0 1.0 packages/react-devtools-shared/src/__tests__/o...
Upvotes: 0
Views: 245
Reputation: 11321
Maybe I completely misunderstood.
I'm assuming that you want to transform a DataFrame df
SHA Insertion Deletion
Author Date
author1 2017-10-18 1 1.0 3.0
2017-10-19 2 1.0 3.0
2017-10-20 6 1.0 3.0
2017-10-21 9 1.0 3.0
author2 2017-10-18 3 8.0 3.0
2017-10-19 19 10.0 3.0
2017-10-20 23 1.0 3.0
2017-10-21 44 1.0 3.0
into the dict
-format you have provided?
If so, then try this:
result = {
key: group.reset_index(level=0, drop=True).to_dict(orient='index')
for key, group in df.groupby('Author')
}
or this
result = (df.groupby('Author')
.apply(lambda sdf: sdf.reset_index(level=0, drop=True).to_dict(orient='index'))
.to_dict())
Result for the sample:
{'author1': {'2017-10-18': {'Deletion': 3.0, 'Insertion': 1.0, 'SHA': 1},
'2017-10-19': {'Deletion': 3.0, 'Insertion': 1.0, 'SHA': 2},
'2017-10-20': {'Deletion': 3.0, 'Insertion': 1.0, 'SHA': 6},
'2017-10-21': {'Deletion': 3.0, 'Insertion': 1.0, 'SHA': 9}},
'author2': {'2017-10-18': {'Deletion': 3.0, 'Insertion': 8.0, 'SHA': 3},
'2017-10-19': {'Deletion': 3.0, 'Insertion': 10.0, 'SHA': 19},
'2017-10-20': {'Deletion': 3.0, 'Insertion': 1.0, 'SHA': 23},
'2017-10-21': {'Deletion': 3.0, 'Insertion': 1.0, 'SHA': 44}}}
EDIT: Another version used by @milan:
result = [
{
"author": key,
"commit_activity": group.to_dict(orient="records"),
"timestamp": [index[1] for index in list(group.index)]
}
for key, group in df.groupby("Author")
]
Result of this version would look like:
[
{'author': 'Aaron Pettengill',
'commit_activity': [{'SHA': 2,
'Insertion': 156.0,
'Deletion': 8.0,
'File path': 2}],
'timestamp': [Timestamp('2020-05-01 00:00:00+0000', tz='UTC')]},
{'author': 'Alex Rohleder',
'commit_activity': [{'SHA': 5,
'Insertion': 5.0,
'Deletion': 5.0,
'File path': 5}],
'timestamp': [Timestamp('2019-09-06 00:00:00+0000', tz='UTC')]},
{'author': 'Alex Taylor',
'commit_activity': [{'SHA': 2,
'Insertion': 30.0,
'Deletion': 3.0,
'File path': 2}],
'timestamp': [Timestamp('2020-04-29 00:00:00+0000', tz='UTC')]}
]
Upvotes: 1