How can I get the number of occurence of 2 rows in a DataFrame?

Question

I am trying to create a network graph. My desired output should have 3 columns: from, to, value

import pandas as pd
data = [
    ['nyc', 'la'], 
    ['nyc', 'atl'], 
    ['nyc', 'la'], 
    ['nyc', 'la'], 
    ['nyc', 'mia'], 
    ['nyc', 'wash'], 
    ['nyc', 'la'], 
    ['dtr', 'la']
    ] 

df = pd.DataFrame(data, columns = ['from', 'to'])

desired outcome

pd.DataFrame({
        "from": ['nyc', 'nyc', 'nyc', 'dtr'],
        "to": ['la', 'atl', 'wash', 'la'],
        "value": [4, 1, 1, 1]})

How can I get the number of occurence of 2 columns in a dataframe?

When I do df.groupby(['from', 'to']).count() I get an empty dataframe

>>> df.groupby(['from', 'to']).count()                                                        
Empty DataFrame
Columns: []
Index: [(dtr, la), (nyc, atl), (nyc, la), (nyc, mia), (nyc, wash)]

Quang Hoang · Accepted Answer

You can use groupby().value_counts:

df.groupby('from')['to'].value_counts().reset_index(name='value')

Output:

  from    to  value
0  dtr    la      1
1  nyc    la      4
2  nyc   atl      1
3  nyc   mia      1
4  nyc  wash      1

How can I get the number of occurence of 2 rows in a DataFrame?

Answers (2)

Related Questions