Reputation: 165
This works BUT the outputs are not matching on the index (Date). Instead the new columns are added but start at the first dataframes last row i.e. the data is stacked "on top" of each other so the Date index is repeated. Is there a way to iterate and create columns that are matched by Date?
indexReturnData = pd.DataFrame()
indexcount = int(input("how many indices do you want to use? Enter a quantity: "))
i=0
while i < indexcount:
indexList = pd.read_sql_query('SELECT * FROM Instruments', conn)
indexList = indexList[['InstrumentName','InstrumentID']]
indexList
indexListFind = input('Enter partial index name: ')
indexList = indexList[indexList['InstrumentName'].str.contains(indexListFind, case=False)]
# #need to add and if else statement in case of errors....
indexList = pd.DataFrame(indexList)
print(indexList)
indexID = input('Pick/Type in an INDEX list ID ("InstrumentID") From the List Above: ')
indexName = indexList.query('InstrumentID ==' + str(indexID))['InstrumentName']
indexName = list(indexName)
indexReturns = pd.read_sql_query("""
SELECT *
FROM InstrumentPrices
WHERE InstrumentID=""" + indexID
, conn)
indexReturns = indexReturns.filter(['ReportingDate', 'Returns'])
indexReturns = indexReturns.rename(columns={'ReportingDate': 'Date','Returns': indexName[0]})
indexReturns = indexReturns.set_index('Date')
indexReturnData = indexReturnData.append(indexReturns)
i += 1
Output:
Date S&P500 S&P600
308 9/1/1995 0.042
309 10/1/1995 -0.004
310 11/1/1995 0.044
311 12/1/1995 0.019
….. ….. ….. …..
603 4/1/2020 0.128
604 5/1/2020 0.048
605 6/1/2020 0.020
606 7/1/2020 0.056
623 9/1/1995 0.025
624 10/1/1995 -0.050
625 11/1/1995 0.038
626 12/1/1995 0.016
….. ….. ….. …..
918 4/1/2020 0.126
919 5/1/2020 0.041
920 6/1/2020 0.036
921 7/1/2020 0.040
Thanks!
Upvotes: 0
Views: 98
Reputation: 3113
Just based on what your current output is and what I think your desired output is, I think you can get away with just a df.groupby('Date').sum()
. Running that will group any duplicates in the 'Date' column and do a sum on all the values it finds for each column. If I'm understanding right, each column will only have a single value for the date-row, so it'll 'sum' that single number: that is, it'll return that number.
I copied the little output section you have above (and removed the blank rows) and just did df.groupby('Date').sum()
and got this:
S&P500 S&P600
Date
10/1/1995 -0.004 -0.050
11/1/1995 0.044 0.038
12/1/1995 0.019 0.016
4/1/2020 0.128 0.126
5/1/2020 0.048 0.041
6/1/2020 0.020 0.036
7/1/2020 0.056 0.040
9/1/1995 0.042 0.025
Upvotes: 1