Reputation: 2646
Input data frame :
id value
0 0 10.2
1 1 5.7
2 2 7.4
3 2 2.5
4 1 2.6
5 3 1.6
6 2 2.9
7 0 3.6
8 2 2.7
Expected Output :
format : [(id,count_of_value,[value as a list])] i.e like this
[ (0,2,[10.2, 3.6]), (1, 2, [5.7, 2.6]). . ]
So far I am able to get the first two elements i.e id and it's count as a tuple, Also I needed the data in a reverse sorted order,
id_list = df.id.tolist()
count = Counter(uid_list)
ID_count_list = sorted(count.items(), key=operator.itemgetter(1),reverse=True)
What is the most efficient way to get the value as explained in the expected output?
Upvotes: 2
Views: 882
Reputation: 61910
You could use groupby + apply to do it all in one step, matching the desired output:
result = df.groupby('id')['value'].apply(lambda x: (x.name, x.size, x.tolist())).tolist()
print(result)
Output
[(0, 2, [10.2, 3.6]), (1, 2, [5.7, 2.6]), (2, 4, [7.4, 2.5, 2.9, 2.7]), (3, 1, [1.6])]
Given the output above you can sorted like this:
result = [(0, 2, [10.2, 3.6]), (1, 2, [5.7, 2.6]), (2, 4, [7.4, 2.5, 2.9, 2.7]), (3, 1, [1.6])]
s = sorted(result, key=operator.itemgetter(1), reverse=True)
print(s)
Output (sorted)
[(2, 4, [7.4, 2.5, 2.9, 2.7]), (0, 2, [10.2, 3.6]), (1, 2, [5.7, 2.6]), (3, 1, [1.6])]
Upvotes: 2
Reputation: 164773
This is a groupby
problem. If a list of lists is sufficient:
res = df.groupby('id')['value'].agg(['count', lambda x: x.tolist()])\
.reset_index().values.tolist()
print(res)
# [[0, 2, [10.2, 3.6]], [1, 2, [5.7, 2.6]],
# [2, 4, [7.4, 2.5, 2.9, 2.7]], [3, 1, [1.6]]]
For a list of tuples, there's an additional step:
res = list(map(tuple, res))
print(res)
# [(0, 2, [10.2, 3.6]), (1, 2, [5.7, 2.6]),
# (2, 4, [7.4, 2.5, 2.9, 2.7]), (3, 1, [1.6])]
Upvotes: 1