Reputation: 5458
I am trying to create a network graph. My desired output should have 3 columns: from, to, value
import pandas as pd
data = [
['nyc', 'la'],
['nyc', 'atl'],
['nyc', 'la'],
['nyc', 'la'],
['nyc', 'mia'],
['nyc', 'wash'],
['nyc', 'la'],
['dtr', 'la']
]
df = pd.DataFrame(data, columns = ['from', 'to'])
desired outcome
pd.DataFrame({
"from": ['nyc', 'nyc', 'nyc', 'dtr'],
"to": ['la', 'atl', 'wash', 'la'],
"value": [4, 1, 1, 1]})
How can I get the number of occurence of 2 columns in a dataframe?
When I do df.groupby(['from', 'to']).count()
I get an empty dataframe
>>> df.groupby(['from', 'to']).count()
Empty DataFrame
Columns: []
Index: [(dtr, la), (nyc, atl), (nyc, la), (nyc, mia), (nyc, wash)]
Upvotes: 2
Views: 31
Reputation: 150785
You can use groupby().value_counts
:
df.groupby('from')['to'].value_counts().reset_index(name='value')
Output:
from to value
0 dtr la 1
1 nyc la 4
2 nyc atl 1
3 nyc mia 1
4 nyc wash 1
Upvotes: 2
Reputation: 777
You're probably looking to use df.groupby(['from', 'to']).size()
Upvotes: 1