Reputation: 31
I'm just starting out, so I try to build things that work first and then think of how can I improve the code.
I've been working with CoinGecko's API to dump data like prices. The first issue I got is that query returns a list of lists. Each entry contains a UNIX timestamp and a value.
First, I used pandas to put this data into a DataFrame.
data = cg.get_coin_market_chart_by_id('bitcoin', 'USD', 'max')
df = pd.DataFrame(data)
It returned a DataFrame with each cell containing a list with a UNIX timestamp and a value.
Obviously, I wasn't happy with each cell containing a UNIX timestamp. So, I made 3 DataFrames out of each Series. I also formatted UNIX timestamps in new indexes to datetime in each.
price = df['prices'].apply(pd.Series)
price.columns = ['date', 'price']
price = price.set_index(['date'])
price.index = pd.to_datetime(price.index, unit = 'ms')
price.columns = ['price']
market_cap = pd.DataFrame(df.market_caps.values.tolist(), index = df.index)
market_cap = market_cap.set_index(0)
market_cap.index = pd.to_datetime(market_cap.index, unit = 'ms')
market_cap.index.names = ['date']
market_cap.columns = ['market_cap']
volume = pd.DataFrame(df.total_volumes.values.tolist(), index = df.index)
volume = volume.set_index(0)
volume.index = pd.to_datetime(volume.index, unit = 'ms')
volume.index.names = ['date']
volume.columns = ['volume']
Finally, I concatenated all 3.
dfs = [price, market_cap, volume]
conc = pd.concat(dfs, axis = 1, sort = False)
I'm not a CS guy or anything, but I want to learn how to manipulate data well. I let you, wizards of StackOverflow, use whatever unpleasant words when describing my code as long as it helps me to improve. Thanks.
Upvotes: 0
Views: 121
Reputation: 95938
In this particular case, pd.DataFrame
accepts a dictionary like this:
{column0:{index0:value0, index1: value1, ...}, ...}
So, just transform your input data by making a dict out of the inner lists:
In [22]: import pandas as pd
In [23]: data ={
...: 'prices': [[1367107200000, 135.3], [1367193600000, 141.96]],
...: 'market_caps': [[1367107200000, 1500517590], [1367193600000, 1575032004.0]],
...: 'total_volumes': [[1367107200000, 0], [1367193600000, 0.0]]
...: }
...:
In [24]: pd.DataFrame({k:dict(v) for k,v in data.items()})
Out[24]:
prices market_caps total_volumes
1367107200000 135.30 1.500518e+09 0.0
1367193600000 141.96 1.575032e+09 0.0
And to get an actual datetime index, use:
In [26]: df.set_index(pd.to_datetime(df.index,unit='ms'))
Out[26]:
prices market_caps total_volumes
2013-04-28 135.30 1.500518e+09 0.0
2013-04-29 141.96 1.575032e+09 0.0
or even in one, fell, swoop:
In [28]: from datetime import datetime
...: pd.DataFrame({
...: k:{datetime.fromtimestamp(x/1000): y for x,y in v}
...: for k,v in data.items()
...: })
Out[28]:
prices market_caps total_volumes
2013-04-27 17:00:00 135.30 1.500518e+09 0.0
2013-04-28 17:00:00 141.96 1.575032e+09 0.0
Although that's getting a bit ugly, IMO.
Upvotes: 1