Reputation: 857
So I have a csv with minute stock data for Microsoft. I am trying to find the low of each trading day. The code looks like:
ticker='MSFT'
df = pd.read_csv('/Volumes/Seagate Portable/S&P 500 List/{}.txt'.format(ticker))
df.columns = ['Extra', 'Dates', 'Open', 'High', 'Low', 'Close', 'Volume']
df.Dates = pd.to_datetime(df.Dates)
df.set_index(df.Dates, inplace=True)
df.drop(['Extra', 'High', 'Volume', 'Dates', 'Open'], axis=1, inplace=True)
df = df.between_time('9:30', '16:00')
df['Low'] = df.Low.groupby(by=[df.index.day]).min()
df
The output is:
Low Close
Dates
2020-01-02 09:30:00 NaN 158.610
2020-01-02 09:31:00 NaN 158.380
2020-01-02 09:32:00 NaN 158.620
2020-01-02 09:33:00 NaN 158.692
2020-01-02 09:34:00 NaN 158.910
... ... ...
2020-12-18 15:56:00 NaN 218.700
2020-12-18 15:57:00 NaN 218.540
2020-12-18 15:58:00 NaN 218.710
2020-12-18 15:59:00 NaN 218.150
2020-12-18 16:00:00 NaN 218.500
So the issue is that the lows are filled with NaN values, which I am asuming is because I am miss using groupby. I have also tried:
ticker='MSFT'
df = pd.read_csv('/Volumes/Seagate Portable/S&P 500 List/{}.txt'.format(ticker))
df.columns = ['Extra', 'Dates', 'Open', 'High', 'Low', 'Close', 'Volume']
df.Dates = pd.to_datetime(df.Dates)
df.set_index(df.Dates, inplace=True)
df.drop(['Extra', 'High', 'Volume', 'Dates', 'Open'], axis=1, inplace=True)
df = df.between_time('9:30', '16:00')
df = df.groupby(by=[df.index.day]).min()
df
The output to this is:
Low Close
Dates
1 150.8200 150.9800
2 150.3600 150.8400
3 152.1900 152.2800
4 165.6200 165.7000
5 165.6900 165.8200
6 156.0000 156.0700
7 157.3200 157.3500
8 157.9491 158.0000
9 150.0000 150.2700
10 152.5800 152.7950
11 151.1500 151.1930
12 138.5800 138.7600
13 140.7300 140.8700
14 161.7200 161.7500
15 162.5700 162.6300
16 135.0000 135.3300
17 135.0000 135.3400
18 135.0200 135.2600
19 139.0000 139.1300
20 135.8600 136.5900
21 166.1102 166.2100
22 165.6800 165.6900
23 132.5200 132.7100
24 141.2700 141.6481
25 144.4400 144.8102
26 148.3700 149.7000
27 149.2000 149.2700
28 152.0000 153.8152
29 165.6900 165.7952
30 150.0100 152.7200
31 156.5600 157.0450
The issue with this is that it is finding the low of of both Close and Open. Also there are only 31 total rows, though there should be more giving this is a dataset of all of 2020. I assuming in doing this I am grouping wrong because I looked at the close prices of each day for the first 31 days, and there is no way that these are the lows of each of those days. So the questions is how can I find the lows of each day, without affecting the Close columns, and avoiding the issues mentioned above?
Upvotes: 1
Views: 223
Reputation: 784
Try this:
unique_dates = list(set([str(date).split()[0] for date in df.index]))
min_values_daily = [min(df.loc[df.index==date].Close) for date in unique_dates]
And finally, create a new dataframe:
low_data = pd.DataFrame({
'date': unique_dates,
'low': min_values_daily
})
Upvotes: 1