Reputation: 1293
I'm trying to separate elements from a list into different columns of a pandas dataframe. Essentially I want, for every tenure
option - i.e. Detached, Semi-detached, columns like detached_price
, detached_cost
, detached_rooms
and detached_asking
, then the same for Semi-detached, Terraced, Flats etc.
p = [{'br8': [{'tenure': 'Detached',
'data': ['£1,248,554', '£571', '4.3', '£1,063,001']},
{'tenure': 'Semi-detached',
'data': ['£581,968', '£499', '3.3', '£587,188']},
{'tenure': 'Terraced', 'data': ['£520,725', '£516', '3.0', '£474,719']},
{'tenure': 'Flats', 'data': ['£424,898', '£516', '2.0', '£394,092']}]}]
I've tried this so far, but it won't parse the columns correctly. Does anybody have any advice or direction as to how to achieve my goal here?
pd.DataFrame.from_records(p).T
My desired output is:
detached_price, detached_cost, detached_rooms, detached_asking, semi_detached_price, etc etc
br8 £1,248,554, £571 , 4.3 , £1,063,001, £581,968
Upvotes: 1
Views: 134
Reputation: 23099
This will be quite a long dataframe, but the below should work :
first we import some modules, and assign your columns, I'm assuming you have a full set of data and no NA values. if you do, you'll need to figure out a way to map your ask, cost, room into your dataframe.
from collections import defaultdict
from itertools import cycle
import pandas as pd
dfs = defaultdict(list)
for index,y in p[0].items():
for _ in y:
for key, value in _.items():
dfs[key].append(value)
dfs['index'] = index
df = pd.DataFrame(dfs).set_index('index')
df = df.explode('data')
status = cycle( ['price','cost','room','ask'])
df['status'] = [next(status) for stat in range(len(df))]
df['tenure'] = df['tenure'] + '_' + df['status']
final = pd.crosstab(df.index,df.tenure,values=df.data,aggfunc='first')
print(final.iloc[:,:4])
tenure Detached_ask Detached_cost Detached_price Detached_room
postcode?
br8 £1,063,001 £571 £1,248,554 4.3
Upvotes: 1