raven
raven

Reputation: 527

How to find occurrences of values in a different column in Python?

I have a pandas DataFrame df that portrays edges of a directed acyclic graph, sorted by Target:

Source       Target
C            A
D            A
A            B
C            B
D            B
E            B
E            C
C            D
E            D

I would like to add a column Weight based on occurrences of values.

Weight should illustrate the number of appearance of the Target value in Target divided by the number of appearance of the Source value in Target.

In other words, the first row of the example should have the Weight of 2/1 = 2, since A appears twice in Target where C appears only once in Target.

I have first tried

df.apply(pd.Series.value_counts)

but the problem is my actual DataFrame is extremely large, so I am not able to manually search for each occurrence value from the outcome and make a quotient. I have also tried to write two new columns that signify the values I need, then to write a final column that consists of what I want:

df['tfreq'] = df.groupby('Target')['Target'].transform('count')
df['sfreq'] = df.groupby('Source')['Target'].transform('count')

but it seems like my second line of code returns the occurrences of Source values in Source column instead of Target column.

Are there any insights on this problem?

Upvotes: 0

Views: 170

Answers (1)

Erfan
Erfan

Reputation: 42906

Use value_counts with map. Then divide them:

val_counts = df['Target'].value_counts()

counts1 = df['Target'].map(val_counts)
counts2 = df['Source'].map(val_counts)

df['Weights'] = counts1.div(counts2) # same as counts1 / counts2

Output

  Source Target  Weights
0      C      A      2.0
1      D      A      1.0
2      A      B      2.0
3      C      B      4.0
4      D      B      2.0
5      E      B      NaN
6      E      C      NaN
7      C      D      2.0
8      E      D      NaN

note: we get NaN because E does not occur in column Target

Upvotes: 2

Related Questions