Pandas inter-column referencing

Question

I have some data as follows:

+--------+------+
| Reason | Keys |
+--------+------+
| x      | a    |
| y      | a    |
| z      | a    |
| y      | b    |
| z      | b    |
| x      | c    |
| w      | d    |
| x      | d    |
| w      | d    |
+--------+------+

I want to get the Reason corresponding to the first occurrence of each Key. Like here, I should get Reasons x,y,x,w for Keys a,b,c,d respectively. After that, I want to compute the percentage of each Reason, as in a metric for how many times each Reason occurs. Thus x = 2/4 = 50%. And w,y = 25% each.

For the percentage, I think I can use something like value_counts(normalize=True) * 100, based on the previous step. What is a good way to proceed?

GZ0 · Accepted Answer

You are right about the second step and the first step could be achieved by

summary = df.groupby("Keys").first()

Pandas inter-column referencing

Answers (2)

Related Questions