Reputation: 867
So I am gathering data from the S&P 500,from a csv file. My question is how would I create one large dataframe, that has 500 columns and with all of the prices. The code is currently:
import pandas as pd
import pandas_datareader as web
import datetime as dt
from datetime import date
import numpy as np
def get_data():
start = dt.datetime(2020, 5, 30)
end = dt.datetime.now()
csv_file = pd.read_csv(os.path.expanduser("/Users/benitocano/Downloads/copyOfSandP500.csv"), delimiter = ',')
tickers = pd.read_csv("/Users/benitocano/Downloads/copyOfSandP500.csv", delimiter=',', names = ['Symbol', 'Name', 'Sector'])
for i in tickers['Symbol'][:5]:
df = web.DataReader(i, 'yahoo', start, end)
df.drop(['High', 'Low', 'Open', 'Close', 'Volume'], axis=1, inplace=True)
get_data()
So as the code shows right now it is just going yo create 500 individual dataframes, and so I am asking how to make it into one large dataframe. Thanks! EDIT: The CSV file link is: https://datahub.io/core/s-and-p-500-companies
I have tried this to the above code:
for stock in data:
series = pd.Series(stock['Adj Close'])
df = pd.DataFrame()
df[ticker] = series
print(df)
Though the output is only one column like so:
ADM
Date
2020-06-01 38.574604
2020-06-02 39.348278
2020-06-03 40.181465
2020-06-04 40.806358
2020-06-05 42.175167
... ...
2020-11-05 47.910000
2020-11-06 48.270000
2020-11-09 49.290001
2020-11-10 50.150002
2020-11-11 50.090000
Why is printing only one column, rather than the rest if them?
Upvotes: 1
Views: 973
Reputation: 8219
The answer depends on the structure of the dataframes that your current code produces. As the code depends on some files on your local drive, we cannot run it so hard to be specific here. In general, there are many options, among the most common I would say are
pandas.concat(..., axis=1)
on that list to concatenate dfs column by column, see heremerge
or join
) your dfs on the Date column that I assume each df has, see hereUpvotes: 1