Reputation: 289
I have a dataset that shows how much was paid ("cenoz
" - cents per ounce) per product category during specific week and in a specific store.
clear
set more off
input week store cenoz category
1 1 2 1
1 1 4 2
1 1 3 3
1 2 5 1
1 2 7 2
1 2 8 3
2 1 4 1
2 1 1 2
2 1 10 3
2 2 3 1
2 2 4 2
2 2 7 3
3 1 5 1
3 1 3 2
3 2 5 1
3 2 4 2
end
I create a new variable cenoz3
that indicates how much on average was paid for category 3 given specific week and a store. Same with cenoz1, and cenoz2.
egen cenoz1 = mean(cenoz/ (category == 1)), by(week store)
egen cenoz2 = mean(cenoz/ (category == 2)), by(week store)
egen cenoz3 = mean(cenoz/ (category == 3)), by(week store)
It turns out that category 3 was not sold in any of the stores (1 and 2) in week 3. As a result, missing values are generated.
week store cenoz category cenoz1 cenoz2 cenoz3
1 1 2 1 2 4 3
1 1 4 2 2 4 3
1 1 3 3 2 4 3
1 2 5 1 5 7 8
1 2 7 2 5 7 8
1 2 8 3 5 7 8
2 1 4 1 4 1 10
2 1 1 2 4 1 10
2 1 10 3 4 1 10
2 2 3 1 3 4 7
2 2 4 2 3 4 7
2 2 7 3 3 4 7
3 1 5 1 5 3 .
3 1 3 2 5 3 .
3 2 5 1 5 4 .
3 2 4 2 5 4 .
I would like to replace missing values of a particular week with values of the previous week and matching store. That's to say:
replace missing values for category 3 in week 3 in store 1
with values for category 3 in week 2 in store 1
and
replace missing values for category 3 in week 3 in store 2
with values for category 3 in week 2 in store 2
Can I use command replace
or is it something more complicated than that?
Something like:
replace cenoz1 = cenoz1[_n-1] if missing(cenoz1)
But I also need to the stores to match, not just the time variable week
.
I found this code provided by Nicholas Cox at http://www.stata.com/support/faqs/data-management/replacing-missing-values/:
by id (time), sort: replace myvar = myvar[_n-1] if myvar >= .
Do you think
by store (week), sort: cenoz1 = cenoz1[_n-1] if missing(cenoz1)
is sufficient?
UPDATE:
When I use the code
by store (week category), sort: replace cenoz3 = cenoz3[_n-1] if missing(cenoz3)
It seems it delivers correct values:
week store cenoz category cenoz1 cenoz2 cenoz3
1 1 2 1 2 4 3
1 1 4 2 2 4 3
1 1 3 3 2 4 3
1 2 5 1 5 7 8
1 2 7 2 5 7 8
1 2 8 3 5 7 8
2 1 4 1 4 1 10
2 1 1 2 4 1 10
2 1 10 3 4 1 10
2 2 3 1 3 4 7
2 2 4 2 3 4 7
2 2 7 3 3 4 7
3 1 5 1 5 3 10
3 1 3 2 5 3 10
3 2 5 1 5 4 7
3 2 4 2 5 4 7
Is there any way to double check this code given that my dataset is quite large?
How make this code not so specific but applicable to any missing cenoz
if it finds one with missing vaues? (cenoz1, cenoz2, cenoz3, cenoz4...cenoz12)
Upvotes: 1
Views: 182
Reputation: 11102
A quick note on why your code
by store (category week), sort: replace cenoz3 = cenoz3[_n-1] if missing(cenoz3)
won't work.
It will work for the example dataset you give. But a slight modification can give unexpected results. Consider the following example:
clear all
set more off
input week store cenoz category
1 1 2 1
1 1 4 2 /*
1 1 3 3 deleted observation */
1 2 5 1
1 2 7 2
1 2 8 3
2 1 4 1
2 1 1 2
2 1 10 3
2 2 3 1
2 2 4 2
2 2 7 3
3 1 5 1
3 1 3 2
3 1 999 3 // new observation
3 2 5 1
3 2 4 2
end
egen cenoz1 = mean(cenoz/ (category == 1)), by(week store)
egen cenoz2 = mean(cenoz/ (category == 2)), by(week store)
egen cenoz3 = mean(cenoz/ (category == 3)), by(week store)
order store category week
sort store category week
list, sepby(store category)
*----- method 1 (your code) -----
gen cenoz3x1 = cenoz3
by store (category week), sort: replace cenoz3x1 = cenoz3x1[_n-1] if missing(cenoz3x1)
*----- method 2 (Nick's code) -----
gen cenoz3x2 = cenoz3
by store category (week), sort: replace cenoz3x2 = cenoz3x2[_n-1] if missing(cenoz3x2)
list, sepby(store category)
Method 1 will assign the price of a category 1 article to a category 2 article (observation 4 of cenoz3x1
). Presumably, something you don't want. If you want to avoid this, then the groups should be based on store category
and not just store
.
The best place to start reading is help
and the manuals.
Upvotes: 0
Reputation: 37208
If you want to use the previous information for the same store and the same category, that should be
by store category (week), sort: replace cenoz3 = cenoz3[_n-1] if missing(cenoz3)
A generalization could be
sort store category week
forval j = 1/12 {
by store category: replace cenoz`j' = cenoz`j'[_n-1] if missing(cenoz`j')
}
However this carrying forward is a fairly crude method of interpolation. Consider linear, cubic, cubic spline, PCHIP methods of interpolation. Use search
to find Stata programs.
Upvotes: 1