Python Pandas Groupby not working as expected

Question

I have a single dataframe like the following:

               ip              domain
0  46.101.214.145  cnwkabrnyld1c0[.]com
1  46.101.214.145        anfj63m[.]com
2  46.101.214.145       anf3xnem[.]com
3   69.195.129.70  cnwkabrnyld1c0[.]com
4   69.195.129.70        anfj63ms[.]com

I am trying to groupby the "ip" column such that it will aggregate the duplicates onto the same line.

Desired output:

               ip              domain
0  46.101.214.145  cnwkabrnyld1c0[.]com | anfj63m[.]com | anf3xnem[.]com
3   69.195.129.70  cnwkabrnyld1c0[.]com
4   69.195.129.70        anfj63ms[.]com

This seems pretty straight forward based on the countless SOF and Panda documentation I've been reading.

Here's my code:

pDNSPanda = pd.read_csv('/tmp/pDNSCSV.csv', names=['ip', 'domain'])
g = pDNSPanda.groupby("ip").agg('|'.join)

Here's what I actually get:

                 domain
ip
0.0.0.0       ip|domain
1.1.1.200     ip|domain
1.148.56.163  ip|domain
1.167.81.129  ip|domain
1.193.28.230  ip|domain

Edit:

It was suggested that I try my own code against the above data set.

test.csv
46.101.214.145  cnwkabrnyld1c0[.]com
46.101.214.145        anfj63m[.]com
46.101.214.145       anf3xnem[.]com
69.195.129.70  cnwkabrnyld1c0[.]com
69.195.129.70        anfj63ms[.]com

>>> import pandas as pd
>>> p = pd.read_csv('/tmp/test.csv', names=['ip', 'domain'])
>>> g = p.groupby("ip").agg("|".join)
>>> print g
                                         domain
ip
46.101.214.145        anfj63m[.]com   ip|domain
46.101.214.145       anf3xnem[.]com   ip|domain
46.101.214.145  cnwkabrnyld1c0[.]com  ip|domain
69.195.129.70        anfj63ms[.]comp|domain
69.195.129.70  cnwkabrnyld1c0[.]com   ip|domain

Alexander · Accepted Answer

You need domain between the groupby and agg commands. Add spacing around you pipe join if desired, e.g. ' | '.join(...).

pDNSPanda.groupby("ip").domain.agg('|'.join)

ip
46.101.214.145    cnwkabrnyld1c0[.]com|anfj63m[.]com|anf3xnem[.]com
69.195.129.70                   cnwkabrnyld1c0[.]com|anfj63ms[.]com
Name: domain, dtype: object

If you have floats in your domains, it is best to convert them to strings.

df['domain'] = df.domain.astype(str)

Python Pandas Groupby not working as expected

Answers (2)

Related Questions