Reputation: 537
I have around 40 .dta
files, all containing the same variables but for different units of observations. I want to use Stata to loop through the directory where I have all 40 files to do the following:
Load each file into Stata and then export into an Excel sheet. The idea is to have one Excel workbook, with 40 sheets - one per .dta file.
In each of the 40 files, count the number of number of distinct observations for the variable named car_type
. The output can be saved into a different Excel sheet than for 1. But I'm open for suggestions.
I have never worked with local
or loops before so any help would be greatly appreciated. I'm aware that for step 1 I could write a do
file with use "C:\File1.dta
and then export excel using "C:\Workbook.xlsx", sheet("File1") firstrow(variables) replace
for each file, but I don't want to repeat this manually one-by-one for each file.
Upvotes: 0
Views: 1789
Reputation: 9470
You can use this as a template to get started:
set more off
/* (A) Create 5 Fake Datasets file1,...,file5 */
sysuse auto, clear
forvalues i=1/5 {
preserve
keep if rep78==`i'
tab rep78
save "file`i'.dta", replace
restore
}
/* (B) Export Fake Files to Excel */
capture erase "my_excel_file.xls" // erase Excel file if it exists
ssc install distinct
ssc install fs
fs "file*.dta" // get a list of files to export to Excel
/* loop over this list, opening each one, calculate distinct values, and put the data below that */
foreach f in `r(files)' {
quietly {
use "`f'", clear
distinct make
generate distinct_vals_of_make = r(ndistinct) in 1
export excel distinct_vals_of_make using "my_excel_file", sheet("`f'") firstrow(variables) cell(A1)
drop distinct_vals_of_make
export excel using "my_excel_file", sheet("`f'", modify) firstrow(variables) cell(A4)
}
}
shell open "my_excel_file.xls"
Answers:
r(files) is the name of the local macro that contains the list of the files returned by fs
. When you surround the name with a back quote and an end quote, r(files) will spit out that list. "foreach f in" is just a way to refer to each element of that list as you loop over them sequentially. You could have written "foreach file_name in" just as well. Each time the loop iterates, f (or file_name) will be redefined to contain the next file, which can be obtained with back quote plus f plus end quote. in 1
means in the first row of the Stata data set.
Finally, I think you can do that with
/* loop over this list, opening each one, calculate distinct values, update the stats sheet, and put the data in a seprate sheet */
/* Initialize the names */
gen file_name=.
gen distinct_vals_of_make=.
export excel file_name distinct_vals_of_make using "my_excel_file", sheet("stats") firstrow(variables)
local j=2
foreach f in `r(files)' {
quietly {
use "`f'", clear
gen file_name = "`f'" in 1
distinct make
generate distinct_vals_of_make = r(ndistinct) in 1
export excel file_name distinct_vals_of_make using "my_excel_file", sheet("stats", modify) cell(A`j')
local ++j
drop distinct_vals_of_make file_name
export excel using "my_excel_file", sheet("`f'") firstrow(variables)
}
}
Upvotes: 1