Reputation: 63
I am working with a dataset that has purchases per date (called ItemNum) on multiple dates across 2800 individuals. Each Item is given its own line, so if an individual has purchased two items on a date, that date will appear twice. I don't care how many items were purchased on a date (with each date representing one trip), but rather the mean number of trips made across the 2800 individuals (For about 18230 lines of data). My data looks like this:
+---+----------+-------+---------------------- ---+
|ID | Date |ItemNum| ItemDescript |
| 1 |01/22/2010| 1 |Description of the item |
| 1 |01/22/2010| 2 |Description of other item |
| 1 |07/19/2013| 1 | |
| 2 |06/04/2012| 1 | |
| 2 |02/02/2013| 1 | |
| 2 |11/13/2013| 1 | |
+---+----------+-------+---------------------- ---+
In the above table, person 1 made two trips and three item purchases (because two dates are shown), person 2 made three trips. I am interested in the average number of trips across all people, but first I need to collapse it down to unique dates. So I know I need to collapse
on the date, but when I do
collapse (mean) ItemNum (first) Date, by(ID)
it just takes the first date that the ID shows up, not the first occurrence of each unique date.
The next issue is that once it's collapsed, I need to take the mean of the count of the dates, not the date itself, which is also where I seem to be getting tripped up.
Upvotes: 0
Views: 1100
Reputation:
Or perhaps something like
clear
input ID str16 dt ItemNum
1 "01/22/2010" 1
1 "01/22/2010" 2
1 "07/19/2013" 1
end
generate Date = daily(dt,"MDY")
egen trip = tag(ID Date)
collapse (sum) trip, by(ID)
summarize trip
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
trip | 1 2 . 2 2
if what you are looking for is found in "Mean" - a single number giving the average number of trips made by the 2800 individuals (1 individual with the limited sample data given).
Upvotes: 1
Reputation: 19375
are you trying to do the following?
collapse (mean) ItemNum, by(ID Date) fast
Upvotes: 1