Reputation: 51
This is the contents of my Dataframe. The ohlc column contains the open , high, low and close values all packed as a dictionary.
In [107]: c[["average_price","ohlc"]]
Out[107]:
average_price ohlc
0 186.81 {u'high': 187.85, u'close': 188.0, u'open': 18...
1 1125.47 {u'high': 1139.0, u'close': 1110.2, u'open': 1...
2 129.82 {u'high': 130.45, u'close': 129.8, u'open': 12...
3 3125.90 {u'high': 3152.0, u'close': 3151.0, u'open': 3...
4 186.67 {u'high': 187.8, u'close': 187.95, u'open': 18...
5 39171.46 {u'high': 39400.0, u'close': 39223.0, u'open':...
6 39168.60 {u'high': 39400.0, u'close': 39225.0, u'open':...
7 191.74 {u'high': 192.5, u'close': 191.9, u'open': 191...
8 186.13 {u'high': 187.45, u'close': 187.55, u'open': 1...
9 410.45 {u'high': 412.9, u'close': 411.55, u'open': 41...
10 129.80 {u'high': 130.45, u'close': 129.8, u'open': 12...
11 3104.35 {u'high': 3132.0, u'close': 3128.0, u'open': 3...
12 23144.99 {u'high': 23225.0, u'close': 23079.0, u'open':...
13 1125.72 {u'high': 1134.0, u'close': 1109.4, u'open': 1...
14 496.41 {u'high': 503.5, u'close': 494.9, u'open': 492...
15 39157.13 {u'high': 39396.0, u'close': 39213.0, u'open':...
16 48273.22 {u'high': 48480.0, u'close': 48120.0, u'open':...
17 194.21 {u'high': 195.1, u'close': 194.5, u'open': 194...
18 3126.46 {u'high': 3153.0, u'close': 3151.0, u'open': 3...
19 149.42 {u'high': 150.3, u'close': 150.95, u'open': 15...
20 186.13 {u'high': 187.4, u'close': 187.55, u'open': 18...
21 683.03 {u'high': 696.2, u'close': 697.2, u'open': 696...
22 683.12 {u'high': 691.1, u'close': 697.2, u'open': 690...
23 29160.76 {u'high': 29300.0, u'close': 29130.0, u'open':...
24 1177.49 {u'high': 1194.0, u'close': 1159.9, u'open': 1...
25 535.56 {u'high': 539.6, u'close': 532.3, u'open': 531...
26 39474.16 {u'high': 40074.0, u'close': 39288.0, u'open':...
27 2889.90 {u'high': 2903.0, u'close': 2886.0, u'open': 2...
28 3104.41 {u'high': 3132.0, u'close': 3129.0, u'open': 3...
29 149.39 {u'high': 150.45, u'close': 150.9, u'open': 15...
30 410.53 {u'high': 413.0, u'close': 411.5, u'open': 409...
0 NaN {u'high': 4551.75, u'close': 4514.95, u'open':...
0 986.81 {u'high': 997.0, u'close': 981.25, u'open': 97...
0 2503.78 {u'high': 2535.0, u'close': 2511.45, u'open': ...
I want to access the open , high , low and close, values, as a separate column, so that i can perform the further processing on it.
Attempt 1: To get the open price from this DataFrame.
In [110]: for ohlc in c.ohlc:
...: print ohlc["open"]
...:
187.75
1118.7
129.2
3143.0
187.0
39290.0
39225.0
191.9
187.0
410.8
129.4
3111.0
23150.0
1112.4
492.5
39211.0
48000.0
194.5
3141.0
150.3
187.0
696.2
690.7
29147.0
1131.1
531.2
39288.0
2889.0
3121.0
150.45
409.25
4453.65
976.0
2499.9
However, problem with this method is I am not able to access "open" values as a dataframe columns and so cannot perform the further processing seamlessly.
Attempt 2:
In [113]: c.ohlc["high"]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-113-ff70f54ba97d> in <module>()
----> 1 c.ohlc["high"]
C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\series.pyc in __getitem__(self, key)
599 key = com._apply_if_callable(key, self)
600 try:
--> 601 result = self.index.get_value(self, key)
602
603 if not is_scalar(result):
C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\indexes\base.pyc in get_value(self, series, key)
2426 try:
2427 return self._engine.get_value(s, k,
-> 2428 tz=getattr(series.dtype, 'tz', None))
2429 except KeyError as e1:
2430 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value (pandas\_libs\index.c:4363)()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value (pandas\_libs\index.c:4046)()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5022)()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._get_loc_duplicates (pandas\_libs\index.c:5469)()
pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._maybe_get_bool_indexer (pandas\_libs\index.c:16350)()
KeyError: 'high'
Attempt 3:
In [124]: list(c.ohlc)[2]["high"]
Out[124]: 130.45
Works for a single row. But need to know the row number for that. Doesnt serve the purpose.
There should be a better and rather simple way to get the open , high, low and close values as a column for further processing, but not able to get it through.
Also, this dataframe is continuously getting updated by a separate thread, so if i plan to extract all details row by row and then add it as a column to the dataframe for further processing, it will be too late or may even lead to error as the number of rows may have changed by then.
PS: Below is the DataFrame creation code.
import numpy as np
c=pandas.DataFrame({'average_price':[186.81, 1125.47, 129.81999999999999, 3125.9000000000001, 186.66999999999999, 39171.459999999999, 39168.599999999999, 191.74000000000001, 186.13, 410.44999999999999, 129.80000000000001, 3104.3499999999999, 23144.990000000002, 1125.72, 496.41000000000003, 39157.129999999997, 48273.220000000001, 194.21000000000001, 3126.46, 149.41999999999999, 186.13, 683.02999999999997, 683.12, 29160.759999999998, 1177.49, 535.55999999999995, 39474.160000000003, 2889.9000000000001, 3104.4099999999999, 149.38999999999999, 410.52999999999997, np.nan, 986.80999999999995, 2503.7800000000002],
'ohlc':[{'close': 188.0, 'high': 187.85, 'low': 185.55, 'open': 187.75}, {'close': 1110.2, 'high': 1139.0, 'low': 1113.6, 'open': 1118.7}, {'close': 129.8, 'high': 130.45, 'low': 128.85, 'open': 129.2}, {'close': 3151.0, 'high': 3152.0, 'low': 3105.0, 'open': 3143.0}, {'close': 187.95, 'high': 187.8, 'low': 185.6, 'open': 187.0}, {'close': 39223.0, 'high': 39400.0, 'low': 38813.0, 'open': 39290.0}, {'close': 39225.0, 'high': 39400.0, 'low': 38800.0, 'open': 39225.0}, {'close': 191.9, 'high': 192.5, 'low': 190.5, 'open': 191.9}, {'close': 187.55, 'high': 187.45, 'low': 185.05, 'open': 187.0}, {'close': 411.55, 'high': 412.9, 'low': 407.3, 'open': 410.8}, {'close': 129.8, 'high': 130.45, 'low': 128.8, 'open': 129.4}, {'close': 3128.0, 'high': 3132.0, 'low': 3082.0, 'open': 3111.0}, {'close': 23079.0, 'high': 23225.0, 'low': 23010.0, 'open': 23150.0}, {'close': 1109.4, 'high': 1134.0, 'low': 1110.0, 'open': 1112.4}, {'close': 494.9, 'high': 503.5, 'low': 489.0, 'open': 492.5}, {'close': 39213.0, 'high': 39396.0, 'low': 38801.0, 'open': 39211.0}, {'close': 48120.0, 'high': 48480.0, 'low': 47990.0, 'open': 48000.0}, {'close': 194.5, 'high': 195.1, 'low': 193.2, 'open': 194.5}, {'close': 3151.0, 'high': 3153.0, 'low': 3107.0, 'open': 3141.0}, {'close': 150.95, 'high': 150.3, 'low': 148.4, 'open': 150.3}, {'close': 187.55, 'high': 187.4, 'low': 185.1, 'open': 187.0}, {'close': 697.2, 'high': 696.2, 'low': 675.5, 'open': 696.2}, {'close': 697.2, 'high': 691.1, 'low': 675.2, 'open': 690.7}, {'close': 29130.0, 'high': 29300.0, 'low': 29032.0, 'open': 29147.0}, {'close': 1159.9, 'high': 1194.0, 'low': 1131.1, 'open': 1131.1}, {'close': 532.3, 'high': 539.6, 'low': 531.2, 'open': 531.2}, {'close': 39288.0, 'high': 40074.0, 'low': 38895.0, 'open': 39288.0}, {'close': 2886.0, 'high': 2903.0, 'low': 2862.0, 'open': 2889.0}, {'close': 3129.0, 'high': 3132.0, 'low': 3072.0, 'open': 3121.0}, {'close': 150.9, 'high': 150.45, 'low': 148.4, 'open': 150.45}, {'close': 411.5, 'high': 413.0, 'low': 407.25, 'open': 409.25}, {'close': 4514.95, 'high': 4551.75, 'low': 4399.1, 'open': 4453.65}, {'close': 981.25, 'high': 997.0, 'low': 969.6, 'open': 976.0}, {'close': 2511.45, 'high': 2535.0, 'low': 2483.0, 'open': 2499.9}]
})
Upvotes: 1
Views: 85
Reputation: 210982
Demo:
In [25]: c
Out[25]:
average_price ohlc
0 186.81 {'close': 188.0, 'high': 187.85, 'low': 185.55...
1 1125.47 {'close': 1110.2, 'high': 1139.0, 'low': 1113....
2 129.82 {'close': 129.8, 'high': 130.45, 'low': 128.85...
3 3125.90 {'close': 3151.0, 'high': 3152.0, 'low': 3105....
4 186.67 {'close': 187.95, 'high': 187.8, 'low': 185.6,...
.. ... ...
29 149.39 {'close': 150.9, 'high': 150.45, 'low': 148.4,...
30 410.53 {'close': 411.5, 'high': 413.0, 'low': 407.25,...
31 NaN {'close': 4514.95, 'high': 4551.75, 'low': 439...
32 986.81 {'close': 981.25, 'high': 997.0, 'low': 969.6,...
33 2503.78 {'close': 2511.45, 'high': 2535.0, 'low': 2483...
[34 rows x 2 columns]
In [26]: c['ohlc'].apply(pd.Series)
Out[26]:
close high low open
0 188.00 187.85 185.55 187.75
1 1110.20 1139.00 1113.60 1118.70
2 129.80 130.45 128.85 129.20
3 3151.00 3152.00 3105.00 3143.00
4 187.95 187.80 185.60 187.00
.. ... ... ... ...
29 150.90 150.45 148.40 150.45
30 411.50 413.00 407.25 409.25
31 4514.95 4551.75 4399.10 4453.65
32 981.25 997.00 969.60 976.00
33 2511.45 2535.00 2483.00 2499.90
[34 rows x 4 columns]
Performing operations:
In [27]: c['ohlc'].apply(pd.Series).loc[:, 'open']
Out[27]:
0 187.75
1 1118.70
2 129.20
3 3143.00
4 187.00
...
29 150.45
30 409.25
31 4453.65
32 976.00
33 2499.90
Name: open, Length: 34, dtype: float64
In [28]: c['ohlc'].apply(pd.Series).loc[:, 'open'].agg(['sum','mean'])
Out[28]:
sum 288932.900000
mean 8498.026471
Name: open, dtype: float64
UPDATE:
In [33]: c[['average_price']].join(c['ohlc'].apply(pd.Series)) \
.eval("avg_high = average_price - high", inplace=False)
Out[33]:
average_price close high low open avg_high
0 186.81 188.00 187.85 185.55 187.75 -1.04
1 1125.47 1110.20 1139.00 1113.60 1118.70 -13.53
2 129.82 129.80 130.45 128.85 129.20 -0.63
3 3125.90 3151.00 3152.00 3105.00 3143.00 -26.10
4 186.67 187.95 187.80 185.60 187.00 -1.13
.. ... ... ... ... ... ...
29 149.39 150.90 150.45 148.40 150.45 -1.06
30 410.53 411.50 413.00 407.25 409.25 -2.47
31 NaN 4514.95 4551.75 4399.10 4453.65 NaN
32 986.81 981.25 997.00 969.60 976.00 -10.19
33 2503.78 2511.45 2535.00 2483.00 2499.90 -31.22
[34 rows x 6 columns]
Upvotes: 1