StatsScared
StatsScared

Reputation: 537

Looping through several Stata files

I have around 40 .dta files, all containing the same variables but for different units of observations. I want to use Stata to loop through the directory where I have all 40 files to do the following:

  1. Load each file into Stata and then export into an Excel sheet. The idea is to have one Excel workbook, with 40 sheets - one per .dta file.

  2. In each of the 40 files, count the number of number of distinct observations for the variable named car_type. The output can be saved into a different Excel sheet than for 1. But I'm open for suggestions.

I have never worked with local or loops before so any help would be greatly appreciated. I'm aware that for step 1 I could write a do file with use "C:\File1.dta and then export excel using "C:\Workbook.xlsx", sheet("File1") firstrow(variables) replace for each file, but I don't want to repeat this manually one-by-one for each file.

Upvotes: 0

Views: 1789

Answers (1)

dimitriy
dimitriy

Reputation: 9470

You can use this as a template to get started:

set more off

/* (A) Create 5 Fake Datasets file1,...,file5 */ 
sysuse auto, clear

forvalues i=1/5 {
    preserve
        keep if rep78==`i'
        tab rep78
        save "file`i'.dta", replace
    restore
}


/* (B) Export Fake Files to Excel */
capture erase "my_excel_file.xls" // erase Excel file if it exists

ssc install distinct
ssc install fs

fs "file*.dta" // get a list of files to export to Excel

/* loop over this list, opening each one, calculate distinct values, and put the data below that */
foreach f in `r(files)' {
    quietly {
        use "`f'", clear
        distinct make
        generate distinct_vals_of_make = r(ndistinct) in 1
        export excel distinct_vals_of_make using "my_excel_file", sheet("`f'") firstrow(variables) cell(A1)
        drop distinct_vals_of_make
        export excel using "my_excel_file", sheet("`f'", modify) firstrow(variables) cell(A4)
    }
}

shell open "my_excel_file.xls"

Answers:

r(files) is the name of the local macro that contains the list of the files returned by fs. When you surround the name with a back quote and an end quote, r(files) will spit out that list. "foreach f in" is just a way to refer to each element of that list as you loop over them sequentially. You could have written "foreach file_name in" just as well. Each time the loop iterates, f (or file_name) will be redefined to contain the next file, which can be obtained with back quote plus f plus end quote. in 1 means in the first row of the Stata data set.

Finally, I think you can do that with

   /* loop over this list, opening each one, calculate distinct values, update the stats sheet, and put the data in a seprate sheet */
/* Initialize the names */
gen file_name=.
gen distinct_vals_of_make=. 
export excel file_name distinct_vals_of_make using "my_excel_file", sheet("stats") firstrow(variables)
local j=2

foreach f in `r(files)' {
    quietly {
        use "`f'", clear
        gen file_name = "`f'" in 1  
        distinct make
        generate distinct_vals_of_make = r(ndistinct) in 1
        export excel file_name distinct_vals_of_make using "my_excel_file", sheet("stats", modify) cell(A`j')
        local ++j

        drop distinct_vals_of_make file_name
        export excel using "my_excel_file", sheet("`f'") firstrow(variables)
    }
}

Upvotes: 1

Related Questions