seder163
seder163

Reputation: 63

Stata Collapsing by first observation date when there are multiple date observations per ID

I am working with a dataset that has purchases per date (called ItemNum) on multiple dates across 2800 individuals. Each Item is given its own line, so if an individual has purchased two items on a date, that date will appear twice. I don't care how many items were purchased on a date (with each date representing one trip), but rather the mean number of trips made across the 2800 individuals (For about 18230 lines of data). My data looks like this:

+---+----------+-------+---------------------- ---+
|ID |    Date  |ItemNum|        ItemDescript      |
| 1 |01/22/2010|   1   |Description of the item   |
| 1 |01/22/2010|   2   |Description of other item |
| 1 |07/19/2013|   1   |                          | 
| 2 |06/04/2012|   1   |                          |
| 2 |02/02/2013|   1   |                          |
| 2 |11/13/2013|   1   |                          |
+---+----------+-------+---------------------- ---+

In the above table, person 1 made two trips and three item purchases (because two dates are shown), person 2 made three trips. I am interested in the average number of trips across all people, but first I need to collapse it down to unique dates. So I know I need to collapse on the date, but when I do

collapse (mean) ItemNum (first) Date, by(ID)

it just takes the first date that the ID shows up, not the first occurrence of each unique date.

The next issue is that once it's collapsed, I need to take the mean of the count of the dates, not the date itself, which is also where I seem to be getting tripped up.

Upvotes: 0

Views: 1100

Answers (2)

user4690969
user4690969

Reputation:

Or perhaps something like

clear
input ID str16 dt ItemNum
1 "01/22/2010"  1
1 "01/22/2010"  2
1 "07/19/2013"  1
end
generate Date = daily(dt,"MDY")
egen trip = tag(ID Date)
collapse (sum) trip, by(ID)
summarize trip

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        trip |          1           2           .          2          2

if what you are looking for is found in "Mean" - a single number giving the average number of trips made by the 2800 individuals (1 individual with the limited sample data given).

Upvotes: 1

ℕʘʘḆḽḘ
ℕʘʘḆḽḘ

Reputation: 19375

are you trying to do the following?

collapse (mean) ItemNum, by(ID Date) fast

Upvotes: 1

Related Questions