Reputation: 3
[excel_1.xlsx,excel_2.xlsx,...,excel_12.xlsx]
.
At first I want to import them into dataframes and then append them into a big dataframe, then df.to_dta
, but python shows error and said:
MemoryError
I guess the problem is that the appended dataframe is too big.
[excel_1.xlsx,excel_2.xlsx,...,excel_12.xlsx]
to
[excel_1.dta,excel_2.dta,...,excel_12.dta]
and append them in Stata, but I don't know how to do that.
import pandas as pd
IO = 'excel_1.xlsx'
df = pd.read_excel(io=IO, skiprows = [1,2] ,
dtype={"Opnprc": "str","Hiprc": "str","Loprc": "str","Clsprc": "str","Dnshrtrd": "str","Dnvaltrd": "str","Dsmvosd": "str",
"Dsmvtll": "str","Dretwd": "str","Dretnd": "str","Adjprcwd": "str","Adjprcnd": "str","Markettype": "str",
"Trdsta": "str"})
df.to_stata('excel1.dta')
I guess a for
loop should work, but I don't know how to do that.
(the append code:
import os
import pandas as pd
cwd = os.path.abspath('D:\\onedrive\\test2')
files = os.listdir(cwd)
print(files)
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel(file, skiprows = [1,2] ,
dtype={"Opnprc": "str","Hiprc": "str","Loprc": "str","Clsprc": "str","Dnshrtrd": "str","Dnvaltrd": "str","Dsmvosd": "str",
"Dsmvtll": "str","Dretwd": "str","Dretnd": "str","Adjprcwd": "str","Adjprcnd": "str","Markettype": "str",
"Trdsta": "str"}), ignore_index=True)
df.head()
df.to_stata('test.dta')
Upvotes: 0
Views: 317
Reputation: 421
Here is how to transform each Excel file to a Stata file using a for
loop in python3.
import pandas as pd
IO = 'excel_{}.xlsx'
num_files = 12
for i in range(1, num_files + 1):
df = pd.read_excel(
io=IO.format(i),
skiprows = [1,2] ,
dtype={"Opnprc": "str","Hiprc": "str","Loprc": "str","Clsprc": "str","Dnshrtrd": "str","Dnvaltrd": "str","Dsmvosd": "str",
"Dsmvtll": "str","Dretwd": "str","Dretnd": "str","Adjprcwd": "str","Adjprcnd": "str","Markettype": "str",
"Trdsta": "str"})
df.to_stata('excel_{}.dta'.format(i))
Upvotes: 1