Reputation: 25
I'm reading tables from the web using pd.read_html and using a for loop in pandas to create a single dataframe from multiple dataframes. I'm able to successfully create a data frame for any given year but I'm missing the correct logic in the for loop to:
(1) Read data, (2) create a dataframe (3) Go to the next year and (4) Append that dataframe to previous dataframe.
The ideal outcome should be 1 dataframe with ~500 rows and 13 columns (for 2 years worth of data). Thanks!
for x in range(2017, 2019):
dfs = pd.read_html('https://www.treasury.gov/resource-center/data-chart-center/interest-rates/pages/TextView.aspx?data=yieldYear&year=(%d)' % x, skiprows=1)
for df in dfs:
print df
DateList = ['Date', '1 mo', '2 mo', '3 mo', '6 mo', '1 yr', '2 yr', '3 yr', '5 yr', '7 yr', '10 yr', '20 yr', '30 yr']
df.columns = DateList
Upvotes: 2
Views: 20128
Reputation: 107567
Consider building a list of data frames, then concatenate items once outside loop. Specifically, below uses a list comprehension that also assigns columns in each iteration, followed by a pd.concat
call.
url = 'https://www.treasury.gov/resource-center/data-chart-center/interest-rates/' + \
'pages/TextView.aspx?data=yieldYear&year=({yr})'
DateList = ['Date', '1 mo', '2 mo', '3 mo', '6 mo', '1 yr', '2 yr',
'3 yr', '5 yr', '7 yr', '10 yr', '20 yr', '30 yr']
dfs = [(pd.read_html(url.format(yr=x), skiprows=1)[1]
.set_axis(DateList, axis='columns', inplace=False)) for x in range(2017, 2019)]
final_df = pd.concat(dfs, ignore_index=True)
print(final_df.head())
# Date 1 mo 2 mo 3 mo 6 mo ... 5 yr 7 yr 10 yr 20 yr 30 yr
# 0 01/03/17 0.52 NaN 0.53 0.65 ... 1.94 2.26 2.45 2.78 3.04
# 1 01/04/17 0.49 NaN 0.53 0.63 ... 1.94 2.26 2.46 2.78 3.05
# 2 01/05/17 0.51 NaN 0.52 0.62 ... 1.86 2.18 2.37 2.69 2.96
# 3 01/06/17 0.50 NaN 0.53 0.61 ... 1.92 2.23 2.42 2.73 3.00
# 4 01/09/17 0.50 NaN 0.50 0.60 ... 1.89 2.18 2.38 2.69 2.97
Upvotes: 3
Reputation: 71
You probably want something like this:
myDataFrame = pd.DataFrame()
for x in range(2017, 2019):
dfs = pd.read_html('https://www.treasury.gov/resource-center/data-chart-center/interest-rates/pages/TextView.aspx?data=yieldYear&year=(%d)' % x, skiprows=1)
for df in dfs:
print df
myDataFrame = myDataFrame.append(df)
DateList = ['Date', '1 mo', '2 mo', '3 mo', '6 mo', '1 yr', '2 yr', '3 yr', '5 yr', '7 yr', '10 yr', '20 yr', '30 yr']
df.columns = DateList
Edit: See @Parfait's comment. Apparently you shouldn't use DataFrame.append in a loop as it may cause problems with memory.
Upvotes: 3