Reputation: 8554
I have datafarme df:
id name number
1 sam 76
2 sam 8
2 peter 8
4 jack 2
I would like to group by on 'id' column and count the number of unique values based on the pair of (name,number)?
id count(name-number)
1 1
2 2
4 1
I have tried this, but it does not work:
df.groupby('id')[('number','name')].nunique().reset_index()
Upvotes: 5
Views: 29314
Reputation: 1571
You can just combine two groupby
s to get the desired result.
import pandas
df = pandas.DataFrame({"id": [1, 2, 2, 4], "name": ["sam", "sam", "peter", "jack"], "number": [8, 8, 8, 2]})
group = df.groupby(['id','name','number']).size().groupby(level=0).size()
The first groupby
will count the complete set of original combinations (and thereby make the columns you want to count unique). The second groupby
will count the unique occurences per the column you want (and you can use the fact that the first groupby
put that column in the index).
The result will be a Series. If you want to have DataFrame with the right column name (as you showed in your desired result) you can use the aggregate
function:
group = df.groupby(['id','name','number']).size().groupby(level=0).agg({'count(name-number':'size'})
Upvotes: 9
Reputation: 11460
To get a list of unique values for column combinations:
grouped= df.groupby('name').number.unique()
for k,v in grouped.items():
print(k)
print(v)
output:
jack
[2]
peter
[8]
sam
[76 8]
To get number of values of one column based on another:
df.groupby('name').number.value_counts().unstack().fillna(0)
output:
number 2 8 76
name
jack 1.0 0.0 0.0
peter 0.0 1.0 0.0
sam 0.0 1.0 1.0
Upvotes: 1
Reputation: 1
try
df.groupby('id').apply(lambda x: x.drop('id',
axis=1).drop_duplicates().shape[0]).reset_index()
Upvotes: 0
Reputation: 1199
You can do:
import pandas
df = pandas.DataFrame({"id": [1, 2, 3, 4], "name": ["sam", "sam", "peter", "jack"], "number": [8, 8, 8, 2]})
g = df.groupby(["name", "number"])
print g.groups
which gives:
{('jack', 2): [3], ('peter', 8): [2], ('sam', 8): [0, 1]}
to get number of unique entries per pair you can do:
for p in g.groups:
print p, " has ", len(g.groups[p]), " entries"
which gives:
('peter', 8) has 1 entries
('jack', 2) has 1 entries
('sam', 8) has 2 entries
update:
the OP asked for result in dataframe. One way to get this is to use aggregate
with the length function, which will return a dataframe with the number of unique entries per pair:
d = g.aggregate(len)
print d.reset_index().rename(columns={"id": "num_entries"})
gives:
name number num_entries
0 jack 2 1
1 peter 8 1
2 sam 8 2
Upvotes: 5