Reputation: 815
I have some data as follows:
+--------+------+
| Reason | Keys |
+--------+------+
| x | a |
| y | a |
| z | a |
| y | b |
| z | b |
| x | c |
| w | d |
| x | d |
| w | d |
+--------+------+
I want to get the Reason
corresponding to the first occurrence of each Key
. Like here, I should get Reasons
x,y,x,w
for Keys
a,b,c,d
respectively. After that, I want to compute the percentage of each Reason
, as in a metric for how many times each Reason
occurs. Thus x = 2/4 = 50%.
And w,y = 25%
each.
For the percentage, I think I can use something like value_counts(normalize=True) * 100
, based on the previous step. What is a good way to proceed?
Upvotes: 0
Views: 138
Reputation: 323366
You can using drop_duplicates
df.drop_duplicates(['Reason'])
Out[207]:
Reason Keys
0 x a
1 y a
2 z a
6 w d
Upvotes: 0
Reputation: 4268
You are right about the second step and the first step could be achieved by
summary = df.groupby("Keys").first()
Upvotes: 1