Olga
Olga

Reputation: 289

If current week has missing value, how to replace it with the value from previous week?

I have a dataset that shows how much was paid ("cenoz" - cents per ounce) per product category during specific week and in a specific store.

clear
set more off
input week  store   cenoz   category
        1      1      2         1
        1      1      4         2
        1      1      3         3
        1      2      5         1
        1      2      7         2
        1      2      8         3
        2      1      4         1
        2      1      1         2
        2      1      10        3
        2      2      3         1
        2      2      4         2
        2      2      7         3
        3      1      5         1
        3      1      3         2
        3      2      5         1
        3      2      4         2
end    

I create a new variable cenoz3 that indicates how much on average was paid for category 3 given specific week and a store. Same with cenoz1, and cenoz2.

egen cenoz1 = mean(cenoz/ (category == 1)), by(week store) 
egen cenoz2 = mean(cenoz/ (category == 2)), by(week store) 
egen cenoz3 = mean(cenoz/ (category == 3)), by(week store) 

It turns out that category 3 was not sold in any of the stores (1 and 2) in week 3. As a result, missing values are generated.

week    store   cenoz   category    cenoz1  cenoz2  cenoz3
  1       1       2        1           2       4      3
  1       1       4        2           2       4      3
  1       1       3        3           2       4      3
  1       2       5        1           5       7      8
  1       2       7        2           5       7      8
  1       2       8        3           5       7      8
  2       1       4        1           4       1      10
  2       1       1        2           4       1      10
  2       1       10       3           4       1      10
  2       2       3        1           3       4      7
  2       2       4        2           3       4      7
  2       2       7        3           3       4      7
  3       1       5        1           5       3      .
  3       1       3        2           5       3      .
  3       2       5        1           5       4      .
  3       2       4        2           5       4      .

I would like to replace missing values of a particular week with values of the previous week and matching store. That's to say:

replace missing values for category 3 in week 3 in store 1 
           with values for category 3 in week 2 in store 1

and

replace missing values for category 3 in week 3 in store 2 
           with values for category 3 in week 2 in store 2

Can I use command replace or is it something more complicated than that?

Something like:

replace cenoz1 = cenoz1[_n-1] if missing(cenoz1) 

But I also need to the stores to match, not just the time variable week.

I found this code provided by Nicholas Cox at http://www.stata.com/support/faqs/data-management/replacing-missing-values/:

by id (time), sort: replace myvar = myvar[_n-1] if myvar >= . 

Do you think

 by store (week), sort: cenoz1 = cenoz1[_n-1] if missing(cenoz1) 

is sufficient?

UPDATE:

When I use the code

by store (week category), sort: replace cenoz3 = cenoz3[_n-1] if missing(cenoz3) 

It seems it delivers correct values:

week    store   cenoz   category    cenoz1  cenoz2  cenoz3
  1       1       2        1           2       4      3
  1       1       4        2           2       4      3
  1       1       3        3           2       4      3
  1       2       5        1           5       7      8
  1       2       7        2           5       7      8
  1       2       8        3           5       7      8
  2       1       4        1           4       1      10
  2       1       1        2           4       1      10
  2       1       10       3           4       1      10
  2       2       3        1           3       4      7
  2       2       4        2           3       4      7
  2       2       7        3           3       4      7
  3       1       5        1           5       3      10
  3       1       3        2           5       3      10
  3       2       5        1           5       4      7
  3       2       4        2           5       4      7

Is there any way to double check this code given that my dataset is quite large?

How make this code not so specific but applicable to any missing cenoz if it finds one with missing vaues? (cenoz1, cenoz2, cenoz3, cenoz4...cenoz12)

Upvotes: 1

Views: 182

Answers (2)

Roberto Ferrer
Roberto Ferrer

Reputation: 11102

A quick note on why your code

by store (category week), sort: replace cenoz3 = cenoz3[_n-1] if missing(cenoz3)

won't work.

It will work for the example dataset you give. But a slight modification can give unexpected results. Consider the following example:

clear all
set more off

input week  store   cenoz   category
        1      1      2         1
        1      1      4         2 /* 
        1      1      3         3 deleted observation */
        1      2      5         1
        1      2      7         2
        1      2      8         3
        2      1      4         1
        2      1      1         2
        2      1      10        3
        2      2      3         1
        2      2      4         2
        2      2      7         3
        3      1      5         1
        3      1      3         2
        3      1    999          3 // new observation
        3      2      5         1
        3      2      4         2
end 

egen cenoz1 = mean(cenoz/ (category == 1)), by(week store) 
egen cenoz2 = mean(cenoz/ (category == 2)), by(week store) 
egen cenoz3 = mean(cenoz/ (category == 3)), by(week store) 

order store category week
sort store category week
list, sepby(store category)

*----- method 1 (your code) -----

gen cenoz3x1 = cenoz3
by store (category week), sort: replace cenoz3x1 = cenoz3x1[_n-1] if missing(cenoz3x1) 

*----- method 2 (Nick's code) -----

gen cenoz3x2 = cenoz3
by store category (week), sort: replace cenoz3x2 = cenoz3x2[_n-1] if missing(cenoz3x2) 

list, sepby(store category)

Method 1 will assign the price of a category 1 article to a category 2 article (observation 4 of cenoz3x1). Presumably, something you don't want. If you want to avoid this, then the groups should be based on store category and not just store.

The best place to start reading is help and the manuals.

Upvotes: 0

Nick Cox
Nick Cox

Reputation: 37208

If you want to use the previous information for the same store and the same category, that should be

by store category (week), sort: replace cenoz3 = cenoz3[_n-1] if missing(cenoz3) 

A generalization could be

sort store category week 
forval j = 1/12 { 
    by store category: replace cenoz`j' = cenoz`j'[_n-1] if missing(cenoz`j') 
} 

However this carrying forward is a fairly crude method of interpolation. Consider linear, cubic, cubic spline, PCHIP methods of interpolation. Use search to find Stata programs.

Upvotes: 1

Related Questions