srkdb
srkdb

Reputation: 815

Pandas inter-column referencing

I have some data as follows:

+--------+------+
| Reason | Keys |
+--------+------+
| x      | a    |
| y      | a    |
| z      | a    |
| y      | b    |
| z      | b    |
| x      | c    |
| w      | d    |
| x      | d    |
| w      | d    |
+--------+------+

I want to get the Reason corresponding to the first occurrence of each Key. Like here, I should get Reasons x,y,x,w for Keys a,b,c,d respectively. After that, I want to compute the percentage of each Reason, as in a metric for how many times each Reason occurs. Thus x = 2/4 = 50%. And w,y = 25% each.

For the percentage, I think I can use something like value_counts(normalize=True) * 100, based on the previous step. What is a good way to proceed?

Upvotes: 0

Views: 138

Answers (2)

BENY
BENY

Reputation: 323366

You can using drop_duplicates

df.drop_duplicates(['Reason'])
Out[207]: 
  Reason Keys
0      x    a
1      y    a
2      z    a
6      w    d

Upvotes: 0

GZ0
GZ0

Reputation: 4268

You are right about the second step and the first step could be achieved by

summary = df.groupby("Keys").first()

Upvotes: 1

Related Questions