user2700264
user2700264

Reputation: 177

Stata: Appending multiple files and extracting variables from file names

I have 114 files with .dat extension to convert to Stata/SE and append, with substantial number of variables (varying from 81 to 16800). I have reset max number of variables to 32000 (set maxvar 32000), increased the memory (set mem 500m) and I was using the following algorithm to combine large number of files and to generate several variables by extracting parts of file names: http://www.ats.ucla.edu/stat/stata/faq/append_many_files.htm

The code looks as follows:

cd "C:\Users\..."
! dir *.dat /a-d /b >d:\Stata_directory\Products_batchfilelist.txt


file open myfile using "d:\Stata_directory\Products_batchfilelist.txt", read
file read myfile line
drop _all
insheet using `line', comma names



gen n = substr("`line'",10,1)
gen m = substr("`line'",12,1)
gen playersnum = substr("`line'",14,1)


save Products_merged.dta, replace

drop _all
file read myfile line

while r(eof)==0 {
    insheet using `line', comma names
    gen n = substr("`line'",10,1)
    gen m = substr("`line'",12,1)
    generate playersnum = substr("`line'",14,1)

    save `line'.dta, replace
    append using Products_merged.dta
    save Products_merged.dta,replace
    drop _all
    file read myfile line
}

The problem is that although variables n,m,playersnumextracted from file names are present in each individual file, they disappear in the final "Products_merged.dta" file. Could anyone tell me what could be the problem and if it is possible to solve with Stata/SE?

Upvotes: 2

Views: 1965

Answers (1)

SOConnell
SOConnell

Reputation: 793

I don't see an obvious problem with the code that would be causing this. It may have something to do with the limits in SE, but that is still unlikely in my mind (you would see an error if a command does something to exceed maxvar).

My only suggestion would be to put a couple commands inside the append loop that will help you debug:

save `line'.dta, replace
append using Products_merged.dta
assert m!="" & n!="" & playersnum!=""
save Products_merged.dta,replace

This will do two things: ensure your variables exist after each new append (your first-order concern), and check that they are never blank (not your stated concern but a good check anyway).

If you post a couple of the files I could probably give a better answer.

Upvotes: 3

Related Questions