Command to insert a word between two words

I have a string variable in Stata with entries like the following

var1 12.34 var2 23.45 var3 34.56
var2 54.32 var3 67.32 
var1 32.32 var3 23.45

I want to create three variables with the above string variable by extracting only the numbers (variable names var1, var2 and var3). However, not all entries have all three variables. So, if I simply extract the numbers by parsing, entry 2 will have var2's value under var1's value and var3's value under var2's value.

Is there a way to make all the entries uniform such that if var2 is missing, the entry needs to be replaced with "var2 0" in between var1 and var3 and if var1 is missing, the entry needs to be replaced with "var1 0" before var2.

Upvotes: 0

Views: 455

Answers (2)

Nick Cox
Nick Cox

Reputation: 37278

I am ignoring the question as inserting words (really two words) isn't needed at all to parse your input as you wish. Indeed, inserting words would be awkward as you need to find which words are present before you can determine which to insert (and where to insert them).

Here's one way to do it. You will need to change 3 wired into this example code if you really have more variables than 3 packed inside your data variable.

* Example generated by -dataex-. For more info, type help dataex
clear
input str32 data
"var1 12.34 var2 23.45 var3 34.56"
"var2 54.32 var3 67.32 "          
"var1 32.32 var3 23.45"           
end

split data, destring 

* initialize to 0 
forval j = 1/3 { 
    gen var`j' = 0 
}

local K = r(k_new) - 1

* look in variables 1, 3, 5, ... for where the values go 
* look in variables 2, 4, 6, ... for what the values are 
forval k = 1(2)`K' {
    local kP1 = `k' + 1 
    forval j = 1/3 { 
        replace var`j' = data`kP1' if data`k' == "var`j'"
    }
}

. l data1-var3

     +-----------------------------------------------------------------------+
     | data1   data2   data3   data4   data5   data6    var1    var2    var3 |
     |-----------------------------------------------------------------------|
  1. |  var1   12.34    var2   23.45    var3   34.56   12.34   23.45   34.56 |
  2. |  var2   54.32    var3   67.32               .       0   54.32   67.32 |
  3. |  var1   32.32    var3   23.45               .   32.32       0   23.45 |
     +-----------------------------------------------------------------------+

Here is another way to do it that doesn't wire in the number of variables or what they are named.

* Example generated by -dataex-. For more info, type help dataex
clear
input str32 data
"var1 12.34 var2 23.45 var3 34.56"
"var2 54.32 var3 67.32 "          
"var1 32.32 var3 23.45"           
end

split data 
drop data 
gen long id = _n 
reshape long data, i(id) 
gen which = data[_n-1] if real(data) < .
keep if !missing(which)
drop _j
reshape wide data, i(id) j(which) string
rename (data*) (*)
ds id, not 
local vars `r(varlist)' 
destring `vars', replace
mvencode `vars', mv(0)

list 

Upvotes: 1

TheIceBear
TheIceBear

Reputation: 3255

Here is a similar but more generalizable solution where the number of variables does not need to be set manually, and the names do not need to have an iterable naming convention.

clear
input str32 data
"aaa 12.34 var 23.45 yyy 34.56"
"var 54.32 yyy 67.32 "          
"aaa 32.32 yyy 23.45"           
end

* Update this local to the name of the variable with data in strings
local datavar data

* Split data in string
split `datavar', destring 
local k = r(k_new)

* Get a list of all variables used in any row
local datavars ""
forvalue i = 1(2)`k' {
    levelsof `datavar'`i', local(vars) clean
    local datavars : list datavars | vars  
}

* Loop over all the variables to be created
foreach var of local datavars {
    * Initiate variable
    gen `var' = 0
    * Copy values for that value
    forvalue i = 1(2)`k' {
        local j = `i' + 1
        replace `var' = `datavar'`j' if `datavar'`i' == "`var'" 
    }
}

Upvotes: 1

Related Questions