Iolkos
Iolkos

Reputation: 229

Python dataframes count number of occurences in two columns

Two dataframe columns:

data['IP']          data['domain']
10.20.30.40         example.org 
10.20.30.40         example.org
10.20.30.40         example.org
10.20.30.40         example.org
1.2.3.4             google.com
1.2.3.4             google.com
1.2.3.4             google.com
200.100.200.100     yahoo.com
200.100.200.100     yahoo.com
9.8.7.6             random.com

I want to find an efficient way that counts how many times each domain is being mapped to the same IP address. Then, if the number of occurences is more than two(2) , take the specific domains ( BUT only the unique values) and move them to another dataframe or column.

So the output could be something like:

[Occurences]    [To be processed]
4               example.org
4               google.com
4
4
3               
3
3

I have tried different things , like Graphs and then taking the degree of the nodes , and pivot tables to express the number , but I want an efficient way that will allow me to continue with the processing of the domain after the if occur>2 statement.

All should be implemented with python panda dataframes!

Upvotes: 2

Views: 176

Answers (1)

EdChum
EdChum

Reputation: 393953

The following performs a groupby on 'domain' and then calls value_counts on 'IP' address, we then filter this and reset the index and rename the columns so they are more meaningful:

In [58]:
gp = df.groupby('domain')['IP'].value_counts()
df1 = gp[gp > 2].reset_index()
df1.rename(columns={'level_1': 'IP', 0:'Occurences'}, inplace=True)
df1

Out[58]:
        domain           IP  Occurences
0  example.org  10.20.30.40           4
1   google.com      1.2.3.4           3

Upvotes: 3

Related Questions