Reputation: 3379
Say I have the following dataframe of addresses and phone numbers:
>>> import pandas as pd
>>> df=pd.DataFrame()
>>> df['Address']=['1 Main St','1 Main St','45 Spruce St','45 Spruce St','100 Green St','100 Green St', '500 Washington','500 Washington']
>>> df['Phone']=['555-5555','555-5555','666-6666','666-6667','777-7777',None,None,None]
>>> df
Address Phone
0 1 Main St 555-5555
1 1 Main St 555-5555
2 45 Spruce St 666-6666
3 45 Spruce St 666-6667
4 100 Green St 777-7777
5 100 Green St None
6 500 Washington None
7 500 Washington None
>>>
I want to groupby both the Address and Phone fields and get a count of addresses in each group. However, I have an added criteria: groups with the same address, but missing (not different) phone numbers should be considered as having the same address and phone numbers. In my example below, this means that the last record should be assumed to have phone number 777-7777 as well. Additionally, groups that have the same address, but neither have phone numbers should be grouped as well (Washington in my example).
When I groupby Address and Phone, None values are dropped:
>>> df.groupby(['Address','Phone']).size().reset_index().rename(columns={0:'Size'})
Address Phone Size
0 1 Main St 555-5555 2
1 100 Green St 777-7777 1
2 45 Spruce St 666-6666 1
3 45 Spruce St 666-6667 1
>>>
Essentially, I am looking to group by Address and Phone when all records within an address group have phone numbers, and just by address when phone values are missing. My desired output would be:
Address Phone
0 1 Main St 555-5555
1 45 Spruce St 666-6666
2 45 Spruce St 666-6667
3 100 Green St 777-7777
4 500 Washington None
>>>
Any suggestions as to how I could do this?
Upvotes: 2
Views: 289
Reputation: 76346
Nice question.
How about the following:
def count_phones(g):
distinct = len(g.dropna().unique())
return distinct if distinct else 1
>>> df.Phone.groupby(df.Address).agg(count_phones)
Address
1 Main St 1
100 Green St 1
45 Spruce St 2
500 Washington 1
dtype: int64
Grouping is just by address.
Within each group, the phone numbers' non-null unique values are counted.
If the number is 0, then all were None
, and we return 1.
Upvotes: 1