Reputation: 527
I have a pandas
DataFrame df
that portrays edges of a directed acyclic graph, sorted by Target
:
Source Target
C A
D A
A B
C B
D B
E B
E C
C D
E D
I would like to add a column Weight
based on occurrences of values.
Weight
should illustrate the number of appearance of the Target
value in Target
divided by the number of appearance of the Source
value in Target
.
In other words, the first row of the example should have the Weight
of 2/1 = 2, since A
appears twice in Target
where C
appears only once in Target
.
I have first tried
df.apply(pd.Series.value_counts)
but the problem is my actual DataFrame is extremely large, so I am not able to manually search for each occurrence value from the outcome and make a quotient. I have also tried to write two new columns that signify the values I need, then to write a final column that consists of what I want:
df['tfreq'] = df.groupby('Target')['Target'].transform('count')
df['sfreq'] = df.groupby('Source')['Target'].transform('count')
but it seems like my second line of code returns the occurrences of Source
values in Source
column instead of Target
column.
Are there any insights on this problem?
Upvotes: 0
Views: 170
Reputation: 42906
Use value_counts
with map
. Then divide them:
val_counts = df['Target'].value_counts()
counts1 = df['Target'].map(val_counts)
counts2 = df['Source'].map(val_counts)
df['Weights'] = counts1.div(counts2) # same as counts1 / counts2
Output
Source Target Weights
0 C A 2.0
1 D A 1.0
2 A B 2.0
3 C B 4.0
4 D B 2.0
5 E B NaN
6 E C NaN
7 C D 2.0
8 E D NaN
note: we get NaN
because E
does not occur in column Target
Upvotes: 2