Wacao
Wacao

Reputation: 51

how to Accessing the contents of a column containing dictionary string data within a dataframe

This is the contents of my Dataframe. The ohlc column contains the open , high, low and close values all packed as a dictionary.

In [107]: c[["average_price","ohlc"]]
Out[107]:
average_price                                               ohlc
0          186.81  {u'high': 187.85, u'close': 188.0, u'open': 18...
1         1125.47  {u'high': 1139.0, u'close': 1110.2, u'open': 1...
2          129.82  {u'high': 130.45, u'close': 129.8, u'open': 12...
3         3125.90  {u'high': 3152.0, u'close': 3151.0, u'open': 3...
4          186.67  {u'high': 187.8, u'close': 187.95, u'open': 18...
5        39171.46  {u'high': 39400.0, u'close': 39223.0, u'open':...
6        39168.60  {u'high': 39400.0, u'close': 39225.0, u'open':...
7          191.74  {u'high': 192.5, u'close': 191.9, u'open': 191...
8          186.13  {u'high': 187.45, u'close': 187.55, u'open': 1...
9          410.45  {u'high': 412.9, u'close': 411.55, u'open': 41...
10         129.80  {u'high': 130.45, u'close': 129.8, u'open': 12...
11        3104.35  {u'high': 3132.0, u'close': 3128.0, u'open': 3...
12       23144.99  {u'high': 23225.0, u'close': 23079.0, u'open':...
13        1125.72  {u'high': 1134.0, u'close': 1109.4, u'open': 1...
14         496.41  {u'high': 503.5, u'close': 494.9, u'open': 492...
15       39157.13  {u'high': 39396.0, u'close': 39213.0, u'open':...
16       48273.22  {u'high': 48480.0, u'close': 48120.0, u'open':...
17         194.21  {u'high': 195.1, u'close': 194.5, u'open': 194...
18        3126.46  {u'high': 3153.0, u'close': 3151.0, u'open': 3...
19         149.42  {u'high': 150.3, u'close': 150.95, u'open': 15...
20         186.13  {u'high': 187.4, u'close': 187.55, u'open': 18...
21         683.03  {u'high': 696.2, u'close': 697.2, u'open': 696...
22         683.12  {u'high': 691.1, u'close': 697.2, u'open': 690...
23       29160.76  {u'high': 29300.0, u'close': 29130.0, u'open':...
24        1177.49  {u'high': 1194.0, u'close': 1159.9, u'open': 1...
25         535.56  {u'high': 539.6, u'close': 532.3, u'open': 531...
26       39474.16  {u'high': 40074.0, u'close': 39288.0, u'open':...
27        2889.90  {u'high': 2903.0, u'close': 2886.0, u'open': 2...
28        3104.41  {u'high': 3132.0, u'close': 3129.0, u'open': 3...
29         149.39  {u'high': 150.45, u'close': 150.9, u'open': 15...
30         410.53  {u'high': 413.0, u'close': 411.5, u'open': 409...
0             NaN  {u'high': 4551.75, u'close': 4514.95, u'open':...
0          986.81  {u'high': 997.0, u'close': 981.25, u'open': 97...
0         2503.78  {u'high': 2535.0, u'close': 2511.45, u'open': ...

I want to access the open , high , low and close, values, as a separate column, so that i can perform the further processing on it.

Attempt 1: To get the open price from this DataFrame.

In [110]: for ohlc in c.ohlc:
     ...:     print ohlc["open"]
     ...:
187.75
1118.7
129.2
3143.0
187.0
39290.0
39225.0
191.9
187.0
410.8
129.4
3111.0
23150.0
1112.4
492.5
39211.0
48000.0
194.5
3141.0
150.3
187.0
696.2
690.7
29147.0
1131.1
531.2
39288.0
2889.0
3121.0
150.45
409.25
4453.65
976.0
2499.9

However, problem with this method is I am not able to access "open" values as a dataframe columns and so cannot perform the further processing seamlessly.

Attempt 2:

In [113]: c.ohlc["high"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-113-ff70f54ba97d> in <module>()
----> 1 c.ohlc["high"]

C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\series.pyc in __getitem__(self, key)
    599         key = com._apply_if_callable(key, self)
    600         try:
--> 601             result = self.index.get_value(self, key)
    602
    603             if not is_scalar(result):

C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\indexes\base.pyc in get_value(self, series, key)
   2426         try:
   2427             return self._engine.get_value(s, k,
-> 2428                                           tz=getattr(series.dtype, 'tz', None))
   2429         except KeyError as e1:
   2430             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value (pandas\_libs\index.c:4363)()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value (pandas\_libs\index.c:4046)()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5022)()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._get_loc_duplicates (pandas\_libs\index.c:5469)()

pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._maybe_get_bool_indexer (pandas\_libs\index.c:16350)()

KeyError: 'high'

Attempt 3:

In [124]: list(c.ohlc)[2]["high"]
Out[124]: 130.45

Works for a single row. But need to know the row number for that. Doesnt serve the purpose.

There should be a better and rather simple way to get the open , high, low and close values as a column for further processing, but not able to get it through.

Also, this dataframe is continuously getting updated by a separate thread, so if i plan to extract all details row by row and then add it as a column to the dataframe for further processing, it will be too late or may even lead to error as the number of rows may have changed by then.

PS: Below is the DataFrame creation code.

import numpy as np
c=pandas.DataFrame({'average_price':[186.81, 1125.47, 129.81999999999999, 3125.9000000000001, 186.66999999999999, 39171.459999999999, 39168.599999999999, 191.74000000000001, 186.13, 410.44999999999999, 129.80000000000001, 3104.3499999999999, 23144.990000000002, 1125.72, 496.41000000000003, 39157.129999999997, 48273.220000000001, 194.21000000000001, 3126.46, 149.41999999999999, 186.13, 683.02999999999997, 683.12, 29160.759999999998, 1177.49, 535.55999999999995, 39474.160000000003, 2889.9000000000001, 3104.4099999999999, 149.38999999999999, 410.52999999999997, np.nan, 986.80999999999995, 2503.7800000000002],
                    'ohlc':[{'close': 188.0, 'high': 187.85, 'low': 185.55, 'open': 187.75}, {'close': 1110.2, 'high': 1139.0, 'low': 1113.6, 'open': 1118.7}, {'close': 129.8, 'high': 130.45, 'low': 128.85, 'open': 129.2}, {'close': 3151.0, 'high': 3152.0, 'low': 3105.0, 'open': 3143.0}, {'close': 187.95, 'high': 187.8, 'low': 185.6, 'open': 187.0}, {'close': 39223.0, 'high': 39400.0, 'low': 38813.0, 'open': 39290.0}, {'close': 39225.0, 'high': 39400.0, 'low': 38800.0, 'open': 39225.0}, {'close': 191.9, 'high': 192.5, 'low': 190.5, 'open': 191.9}, {'close': 187.55, 'high': 187.45, 'low': 185.05, 'open': 187.0}, {'close': 411.55, 'high': 412.9, 'low': 407.3, 'open': 410.8}, {'close': 129.8, 'high': 130.45, 'low': 128.8, 'open': 129.4}, {'close': 3128.0, 'high': 3132.0, 'low': 3082.0, 'open': 3111.0}, {'close': 23079.0, 'high': 23225.0, 'low': 23010.0, 'open': 23150.0}, {'close': 1109.4, 'high': 1134.0, 'low': 1110.0, 'open': 1112.4}, {'close': 494.9, 'high': 503.5, 'low': 489.0, 'open': 492.5}, {'close': 39213.0, 'high': 39396.0, 'low': 38801.0, 'open': 39211.0}, {'close': 48120.0, 'high': 48480.0, 'low': 47990.0, 'open': 48000.0}, {'close': 194.5, 'high': 195.1, 'low': 193.2, 'open': 194.5}, {'close': 3151.0, 'high': 3153.0, 'low': 3107.0, 'open': 3141.0}, {'close': 150.95, 'high': 150.3, 'low': 148.4, 'open': 150.3}, {'close': 187.55, 'high': 187.4, 'low': 185.1, 'open': 187.0}, {'close': 697.2, 'high': 696.2, 'low': 675.5, 'open': 696.2}, {'close': 697.2, 'high': 691.1, 'low': 675.2, 'open': 690.7}, {'close': 29130.0, 'high': 29300.0, 'low': 29032.0, 'open': 29147.0}, {'close': 1159.9, 'high': 1194.0, 'low': 1131.1, 'open': 1131.1}, {'close': 532.3, 'high': 539.6, 'low': 531.2, 'open': 531.2}, {'close': 39288.0, 'high': 40074.0, 'low': 38895.0, 'open': 39288.0}, {'close': 2886.0, 'high': 2903.0, 'low': 2862.0, 'open': 2889.0}, {'close': 3129.0, 'high': 3132.0, 'low': 3072.0, 'open': 3121.0}, {'close': 150.9, 'high': 150.45, 'low': 148.4, 'open': 150.45}, {'close': 411.5, 'high': 413.0, 'low': 407.25, 'open': 409.25}, {'close': 4514.95, 'high': 4551.75, 'low': 4399.1, 'open': 4453.65}, {'close': 981.25, 'high': 997.0, 'low': 969.6, 'open': 976.0}, {'close': 2511.45, 'high': 2535.0, 'low': 2483.0, 'open': 2499.9}]
                    })

Upvotes: 1

Views: 85

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210982

Demo:

In [25]: c
Out[25]:
    average_price                                               ohlc
0          186.81  {'close': 188.0, 'high': 187.85, 'low': 185.55...
1         1125.47  {'close': 1110.2, 'high': 1139.0, 'low': 1113....
2          129.82  {'close': 129.8, 'high': 130.45, 'low': 128.85...
3         3125.90  {'close': 3151.0, 'high': 3152.0, 'low': 3105....
4          186.67  {'close': 187.95, 'high': 187.8, 'low': 185.6,...
..            ...                                                ...
29         149.39  {'close': 150.9, 'high': 150.45, 'low': 148.4,...
30         410.53  {'close': 411.5, 'high': 413.0, 'low': 407.25,...
31            NaN  {'close': 4514.95, 'high': 4551.75, 'low': 439...
32         986.81  {'close': 981.25, 'high': 997.0, 'low': 969.6,...
33        2503.78  {'close': 2511.45, 'high': 2535.0, 'low': 2483...

[34 rows x 2 columns]

In [26]: c['ohlc'].apply(pd.Series)
Out[26]:
      close     high      low     open
0    188.00   187.85   185.55   187.75
1   1110.20  1139.00  1113.60  1118.70
2    129.80   130.45   128.85   129.20
3   3151.00  3152.00  3105.00  3143.00
4    187.95   187.80   185.60   187.00
..      ...      ...      ...      ...
29   150.90   150.45   148.40   150.45
30   411.50   413.00   407.25   409.25
31  4514.95  4551.75  4399.10  4453.65
32   981.25   997.00   969.60   976.00
33  2511.45  2535.00  2483.00  2499.90

[34 rows x 4 columns]

Performing operations:

In [27]: c['ohlc'].apply(pd.Series).loc[:, 'open']
Out[27]:
0      187.75
1     1118.70
2      129.20
3     3143.00
4      187.00
       ...
29     150.45
30     409.25
31    4453.65
32     976.00
33    2499.90
Name: open, Length: 34, dtype: float64

In [28]: c['ohlc'].apply(pd.Series).loc[:, 'open'].agg(['sum','mean'])
Out[28]:
sum     288932.900000
mean      8498.026471
Name: open, dtype: float64

UPDATE:

In [33]: c[['average_price']].join(c['ohlc'].apply(pd.Series)) \
                             .eval("avg_high = average_price - high", inplace=False)
Out[33]:
    average_price    close     high      low     open  avg_high
0          186.81   188.00   187.85   185.55   187.75     -1.04
1         1125.47  1110.20  1139.00  1113.60  1118.70    -13.53
2          129.82   129.80   130.45   128.85   129.20     -0.63
3         3125.90  3151.00  3152.00  3105.00  3143.00    -26.10
4          186.67   187.95   187.80   185.60   187.00     -1.13
..            ...      ...      ...      ...      ...       ...
29         149.39   150.90   150.45   148.40   150.45     -1.06
30         410.53   411.50   413.00   407.25   409.25     -2.47
31            NaN  4514.95  4551.75  4399.10  4453.65       NaN
32         986.81   981.25   997.00   969.60   976.00    -10.19
33        2503.78  2511.45  2535.00  2483.00  2499.90    -31.22

[34 rows x 6 columns]

Upvotes: 1

Related Questions