Alec M
Alec M

Reputation: 35

String to indicator variable, type mismatch error

I am trying to convert a string variable (type str2, format %9s) into an indicator variable in Stata.

However, I keep receiving the following error:

type mismatch r(109)

I am using the 2016 ANES set and I am essentially trying to group states into open primary and closed primary/caucus states.

I have attempted the following code:

gen oprim= (state=="AL" & "AK" & "CO" & "GA" &...)

gen oprim=1 if state=="AL" & "AK" & "CO" & "GA" &...

I have had trouble converting this variable before. for example, I tried generating the new indicator variable without putting quotations around the state codes.

I have also tried to destring the variable, but I am receiving the following output:

destring state, generate(statenum) float
state: contains nonnumeric characters; no **generate**

Any help anyone could offer would be much appreciated.

Upvotes: 1

Views: 1490

Answers (2)

user8682794
user8682794

Reputation:

Using the first ten observations of the census toy dataset:

sysuse census, clear
keep if _n <= 10

The following works for me:

generate oprim = 0 
replace oprim = 1 if state2 == "AZ" | state2 == "DE"

list state2 oprim, separator(0)

     +----------------+
     | state2   oprim |
     |----------------|
  1. | AL           0 |
  2. | AK           0 |
  3. | AZ           1 |
  4. | AR           0 |
  5. | CA           0 |
  6. | CO           0 |
  7. | CT           0 |
  8. | DE           1 |
  9. | FL           0 |
 10. | GA           0 |
     +----------------+

Upvotes: 1

Nick Cox
Nick Cox

Reputation: 37208

Let's spell out why the code in the question is wrong. The OP doesn't give example data but the errors are all identifiable without such data, assuming naturally that state is a string variable in the dataset.

First, we can leave out the ... (which no one presumes are legal) and the parentheses (which make no difference).

gen oprim = state=="AL" & "AK" & "CO" & "GA"

gen oprim=1 if state=="AL" & "AK" & "CO" & "GA" 

Either of these will fail because Stata parses the if condition as

if

state == "AL"

& "AK"

& "CO"

& "GA"

state == "AL" is a true-or-false condition evaluated as 0 or 1, but none of "AK" "CO" "GA" is a true or false condition; they are all string values and so the commands fail, because Stata needs to see something numeric as each of the elements in a if condition. Although clearly silly,

gen oprim = state == "AL" & 42

would be legal as 42 is numeric (and in true-or-false evaluations counts as true). Stata won't fill in state ==, which is what you hope to see implied.

If you rewrite

gen oprim = state == "AL" & state == "AK" & state == "CO" & state == "GA" 

then you have a legal command. It's just not at all what you evidently want. It's impossible for state to be equal to different string values in the same observation, which is what this command is testing for. You're confusing & (and) with | (or).

gen oprim = state == "AL" | state == "AK" | state == "CO" | state == "GA" 

Such statements get long and are tedious and error-prone to write out, but Stata has alternative syntax

gen oprim = inlist(state, "AL", "AK", "CO", "GA") 

There are limits to that -- and yet other strategies too -- but I will leave this answer there without addressing further issues.

Upvotes: 0

Related Questions