Reputation: 25
here is the given data format
USERID, USERNAME, EVENTID, DATE, AMOUNT
001,User1,EV-001,2020-01-01,1000
E001,User1,EV-001,2021-01-01,1000
E001,User1,EV-002,2020-03-01,300
E001,User1,EV-002,2020-04-01,500
E002,User2,EV-003,2020-01-02,100
E002,User2,EV-003,2020-02-02,200
E002,User3,EV-003,2020-03-02,300
E003,User4,EV-004,2024-01-01,100
If i am suppose to sum the column 4 for each EVENTID before for SORTED BY USERID, how to i group them.
I tried this for every row in csv to create a list of dict with EVENTID as key and other columns as list for value. Then summed over the list for each key, but then that gives me the summation per EVENTID and i am not able to find a way to group them(per USERID) before summing
if (row['DATE'] <= input_date):
vesting_by_award[row['AWARD_ID']].append(
{'USERID': row['USERID'], 'USERNAME':
// i do above for every row
row['USERNAME'], 'AMOUNT': row['AMOUNT']})
for k, v in vesting_by_award.items():
# print(list(v))
result[k] = {'SUM': sum(
int(item['AMOUNT'])for item in list(v))}
input_date : 2020-04-01
output : {'EV001': {'SUM': 1000}, 'EV002': {'SUM': 800}, 'EV003': {'SUM': 600}})
desired_output :
E001,User1,EV-001,1000
E001,User1,EV-002,800
E002,User2,EV-001,600
E003,User3,EV-003,0
The intention is to do this without using pandas.
Upvotes: 0
Views: 141
Reputation: 101
I'm a little confused as to what you're trying to do. Your dictionary/hash table approach would be the way to do this when not using pandas but you need to reorganize your dictionary.
Assuming you are trying to sum(AMOUNT) for each EVENTID for each USERID then you need to have the dictionary setup with USERID as the first key, EVENTID as the second key, and the AMOUNTs as the final value.
If "row" is a tuple or list and "rows" is a list of row tuples/lists:
for row in rows:
try:
my_dict[row[0]][row[2]].append(row[-1])
except KeyError:
my_dict[row[0]][row[2]] = [row[-1]]
Then you can compute sums. If this is not what you're trying to do, the logic still holds. Main concept is to create your dictionary with the proper hierarchical structure.
Not sure why you do not want to use pandas but that is probably the right tool for the job in this case.
Upvotes: 1