Reputation: 419
So, I'm trying to import the stock prices for S&P500 (SPY) BP, the (O&G/energy company). The result I am looking for is a "table" of 3 columns; 1 for dates, 1 for the Adj Close of SPY and 1 for the Adj Close of BP. However, my code produces:
ValueError: columns overlap but no suffix specified: Index(['SPY'], dtype='object')
I understand what this error is telling me though: The index column “Adj Close” has an overlap. Irrespective of the stock, the column we are extracting each time is called “SPY”. The join() method I am using is confused because column names must be unique, well, something like that is how I've interpreted it...
The code:
import pandas as pd
def test_run():
start_date=('2016-03-10') #start date parameter
end_date=('2017-03-10') #end date parameter
dates=pd.date_range(start_date,end_date)
df1=pd.DataFrame(index=dates) #create empty dataframe df1
dfSPY=pd.read_csv("C:\SPY.csv",index_col="Date",parse_dates=True,
usecols=['Date','Adj Close'],na_values=['nan'])
#create dataframe for SPY stock
#rename Adj Close column to SPY to prevent clash
dfSPY=dfSPY.rename(columns={'Adj Close':'SPY'})
#join the 2 dataframes using DataFrame.join(), and how='inner'
df1=df1.join(dfSPY,how='inner')
#read in more stocks; SPY & BP
symbols=['SPY','BP']
for symbol in symbols:
df_temp=pd.read_csv("C{}.csv".format(symbol),index_col='Date',parse_dates=True,
usecols=['Date','Adj Close'],na_values=['nan'])
#rename to prevent clash
df_temp=df_temp.rename(columns={'Adj Close':symbol})
df1=df1.join(df_temp) #use default how='left'
print(df1)
if __name__=="__main__":
test_run()
So, that's the code I've got. If there's anyone out there who can shed some light as to what an Earth I've done wrong, please let me know.
Many thanks!
Upvotes: 1
Views: 159
Reputation: 1343
The code you provided is overriding the value of df_temp in your for loop- it will only end up with the value assigned during the last iteration. I assume the last two lines posted below are actually inside your for loop:
for symbol in symbols:
df_temp=pd.read_csv("C{}.csv".format(symbol),index_col='Date',parse_dates=True,
usecols=['Date','Adj Close'],na_values=['nan'])
df_temp=df_temp.rename(columns={'Adj Close':symbol})
df1=df1.join(df_temp) #use default how='left'
There's already an 'SPY' column after you joined dfSPY to df1. You have 'SPY' again in your list of symbols, which is going to throw an error because pandas can't join dataframes with overlapping column names, unless you specify a suffix to distinguish the columns
Upvotes: 2
Reputation: 419
I just wanted to kinda get this question closed. So I gave up on importing the .CSV file of stocks, and just "imported" directly from Yahoo Finance. This doesn't really answer my original question, and so I still don't know what went wrong, but the following solution is much more efficient and "elegant" I feel:
import pandas as pd
import pandas.io.data as web
import datetime
start = datetime.datetime(2000,1,1)
end = datetime.date.today()
BP=web.DataReader("BP","yahoo",start,end)
SPY=web.DataReader("SPY","yahoo",start,end)
df_stocks=pd.DataFrame({"BP":BP["Adj Close"],"SPY":SPY["Adj Close"]})
df_stocks.tail()
BP SPY
Date
2017-03-07 33.869999 237.000000
2017-03-08 33.310001 236.559998
2017-03-09 33.500000 236.860001
2017-03-10 34.330002 237.690002
2017-03-13 34.070000 237.809998
Thanks to anyone who had a look.
Upvotes: 0