Reputation: 3582
I have a Pandas DataFrame
which has a DATE
column and a userid
column like so:
userid DATE
0 vDKE27PF6z 2016-05-26
1 z6XJewg04Y 2016-05-25
2 e5Fsyak59l 2016-05-25
3 7SaYpGRtMI 2016-05-25
4 KGmTH0XOMz 2016-05-25
There are many distinct DATE
values and for each DATE
a particular userid
can appear twice. For each date, I'd like to calculate the number of rows that correspond to that DATE
divided by the number of unique userid
values that appear for a given DATE
. What's the best way to do this? I've tried various permutations of groupby
, but these don't go where I want to be.
Upvotes: 0
Views: 2327
Reputation: 25199
Suppose you have
df
userid DATE
0 vDKE27PF6z 2016-05-26
1 z6XJewg04Y 2016-05-25
2 e5Fsyak59l 2016-05-25
3 7SaYpGRtMI 2016-05-25
4 KGmTH0XOMz 2016-05-25
Then you may proceed as follows:
df.groupby('DATE',as_index=False).agg(lambda x: 1/len(np.unique(x)))
DATE userid
0 2016-05-25 0.25
1 2016-05-26 1.00
Upvotes: 0
Reputation: 13800
You should be able to do something like this:
df = pd.DataFrame({"userid": ["vDKE27PF6z", "vDKE27PF6z", "e5Fsyak59l", "7SaYpGRtMI", "KGmTH0XOMz" ], "DATE" : ["2016-05-25","2016-05-25","2016-05-25","2016-05-25","2016-05-26"]})
dates = df.groupby("DATE").count() # Number of occurrences of each date
dates["uniques"] = df.groupby("DATE").userid.nunique() # Number of unique users per day
dates["ratio"] = dates.userid/dates.uniques # Ratio of the two
Upvotes: 4
Reputation: 1030
pd.groupby
is the way to go. I think you're looking for the describe()
function. It gives simple statistics about the DataFrame in question.
data = pd.DataFrame({'userid':['vDKE27PF6z', 'z6XJewg04Y',
'e5Fsyak59l', '7SaYpGRtMI', 'KGmTH0XOMz'],
'DATE': ['2016-05-26', '2016-05-25',
'2016-05-25', '2016-05-25', '2016-05-25']})
for ind, grp in data.groupby('DATE'):
print(grp.describe())
The output is a pd.DataFrame object that has a count
of the number of objects in that group, and the number of unique
objects in that group. Sample output below:
>>> DATE userid
count 4 4
unique 1 4
top 2016-05-25 7SaYpGRtMI
freq 4 1
>>> DATE userid
count 1 1
unique 1 1
top 2016-05-26 vDKE27PF6z
freq 1 1
Notice for date '2016-05-25', there is one unique date (of course) and 4 unique user ids. More information about the describe()
function can be found here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html
Upvotes: 0
Reputation: 1375
df.groupby('DATE')['userid'].agg(lambda x:x.nunique()*1.0/x.count())
in python3 would be fine without the *1.0
Upvotes: 2