MRL
MRL

Reputation: 419

Importing Two Financial Stocks with Python

So, I'm trying to import the stock prices for S&P500 (SPY) BP, the (O&G/energy company). The result I am looking for is a "table" of 3 columns; 1 for dates, 1 for the Adj Close of SPY and 1 for the Adj Close of BP. However, my code produces:

ValueError: columns overlap but no suffix specified: Index(['SPY'], dtype='object')

I understand what this error is telling me though: The index column “Adj Close” has an overlap. Irrespective of the stock, the column we are extracting each time is called “SPY”. The join() method I am using is confused because column names must be unique, well, something like that is how I've interpreted it...

The code:

import pandas as pd

def test_run():
    start_date=('2016-03-10') #start date parameter
    end_date=('2017-03-10') #end date parameter
    dates=pd.date_range(start_date,end_date)
    df1=pd.DataFrame(index=dates) #create empty dataframe df1

    dfSPY=pd.read_csv("C:\SPY.csv",index_col="Date",parse_dates=True,
                  usecols=['Date','Adj Close'],na_values=['nan']) 
    #create dataframe for SPY stock

    #rename Adj Close column to SPY to prevent clash
    dfSPY=dfSPY.rename(columns={'Adj Close':'SPY'})

    #join the 2 dataframes using DataFrame.join(), and how='inner'
    df1=df1.join(dfSPY,how='inner')

    #read in more stocks; SPY & BP
    symbols=['SPY','BP']

    for symbol in symbols:
          df_temp=pd.read_csv("C{}.csv".format(symbol),index_col='Date',parse_dates=True,
                  usecols=['Date','Adj Close'],na_values=['nan'])

    #rename to prevent clash
    df_temp=df_temp.rename(columns={'Adj Close':symbol})
    df1=df1.join(df_temp) #use default how='left'
print(df1)

if __name__=="__main__":
test_run()

So, that's the code I've got. If there's anyone out there who can shed some light as to what an Earth I've done wrong, please let me know.

Many thanks!

Upvotes: 1

Views: 159

Answers (2)

user3582076
user3582076

Reputation: 1343

The code you provided is overriding the value of df_temp in your for loop- it will only end up with the value assigned during the last iteration. I assume the last two lines posted below are actually inside your for loop:

for symbol in symbols:
    df_temp=pd.read_csv("C{}.csv".format(symbol),index_col='Date',parse_dates=True,
              usecols=['Date','Adj Close'],na_values=['nan'])

    df_temp=df_temp.rename(columns={'Adj Close':symbol})
    df1=df1.join(df_temp) #use default how='left'

There's already an 'SPY' column after you joined dfSPY to df1. You have 'SPY' again in your list of symbols, which is going to throw an error because pandas can't join dataframes with overlapping column names, unless you specify a suffix to distinguish the columns

Upvotes: 2

MRL
MRL

Reputation: 419

I just wanted to kinda get this question closed. So I gave up on importing the .CSV file of stocks, and just "imported" directly from Yahoo Finance. This doesn't really answer my original question, and so I still don't know what went wrong, but the following solution is much more efficient and "elegant" I feel:

import pandas as pd
import pandas.io.data as web
import datetime

start = datetime.datetime(2000,1,1)
end = datetime.date.today()

BP=web.DataReader("BP","yahoo",start,end)
SPY=web.DataReader("SPY","yahoo",start,end)
df_stocks=pd.DataFrame({"BP":BP["Adj Close"],"SPY":SPY["Adj Close"]})

df_stocks.tail()

               BP         SPY
Date                             
2017-03-07    33.869999   237.000000
2017-03-08    33.310001   236.559998
2017-03-09    33.500000   236.860001
2017-03-10    34.330002   237.690002
2017-03-13    34.070000   237.809998

Thanks to anyone who had a look.

Upvotes: 0

Related Questions