Arthur Morris
Arthur Morris

Reputation: 1348

Type mismatch when replacing missing observations with previous values using time-series operators in Stata

Consider the following example. I begin with an str6 'name' variable, and a year for two entities observed every other year.

clear
input str6 nameStr year
"A" 2002
"A" 2004
"A" 2006
"B" 2002
"B" 2004
"B" 2006
end

Then I use tsfill to balance the panel:

egen id = group(nameStr)
xtset id year
tsfill

The dataset is now:

input str6 nameStr year id
"A" 2002 1
""  2003 1 
"A" 2004 1
""  2005 1
"A" 2006 1
"B" 2002 2
""  2003 2 
"B" 2004 2
""  2005 2 
"B" 2006 2
end

Now I could use something like xfill to fill in the missing string identifier. Or, based on the related Stata FAQ and the documentation for Time-series varlists (help tsvarlist) I expect that something like the following to fill in the values of nameStr:

sort id year \\ not required because the data are still sorted from xtset and tsfill
replace nameStr = nameStr[_n-1] if mi(nameStr) &  id[_n-1] == id

and it does.

However, I also expect the following to produce the same behavior, and it does not.

replace nameStr = l.nameStr if mi(nameStr)

Instead Stata returns:

type mismatch
r(109);

While there are several ways to work around this (I've listed two), I'm interested in understanding why this happens. Most similar discussions address cases where two variables of differing types are involved, obviously this isn't the case here, since only one variable is involved.

Upvotes: 1

Views: 663

Answers (1)

Nick Cox
Nick Cox

Reputation: 37208

Stata does not allow time series operators to be applied to string variables. If you think about it you will see that previous (lagging) and following (leading) string values make sense but differences don't, at least not so much. The only simple interpretation of differences would be binary, namely strings at two times are the same or different.

So, Stata is not implying that you can't work with other string values for any panel; it just doesn't support calculations on strings using time series operators.

In addition to the syntax you mention stripolate from SSC supports string interpolation: see this Statalist thread.

Upvotes: 1

Related Questions