Reputation: 23
I'm relatively new to python, but have been working on a pet project looking at housing markets. Effectively I have a series of files that I've read in and stored as variables with the raw format as follows:
**
cityrentalprice1BR.columns
Index([u'RegionName', u'State', u'Metro', u'CountyName', u'SizeRank',
u'2010-02', u'2010-03', u'2010-04', u'2010-05', u'2010-06',
...
u'2017-09', u'2017-10', u'2017-11', u'2017-12', u'2018-01', u'2018-02',
u'2018-03', u'2018-04', u'2018-05', u'2018-06'],
dtype='object', length=106)**
The "wide" formatting is not particularly helpful for my exercise so I've used the melt function to have a better timeseries by city and the results are great on single dataframe:
cityrentalprice1BR_melt = pd.melt(cityrentalprice1BR, id_vars=['RegionName', 'State', 'Metro', 'CountyName', 'SizeRank'],
value_vars = dates,
value_name = 'Value',
var_name =['Date'])
cityrentalprice1BR_melt['Date'] = pd.to_datetime(cityrentalprice1BR_melt['Date'], format='%Y-%m')
My problem arises in that I have multiple files to import with the same formatting and I'd prefer to not manually melt each dataframe. One big caveat is I'd prefer to have the results pushed into new unique dataframes (i.e. in the example above cityrentalprice1BR > cityrentalprice1BR_melt).
I've been working on this for a while and have created a list of dataframes and can apply the melt function to the entire list as follows, however it is missing my key goal of retaining separate dataframes (or column sets per dataframe if a merge function is more appropriate):
Rental = [cityrentalprice1BR, cityrentalprice2BR, cityrentalprice3BR, cityrentalprice4BR, cityrentalprice5BR, cityrentalpriceCondo, cityrentalpriceDupTri]
for size in Rental:
transformrent = pd.melt(size, id_vars=['RegionName', 'State', 'Metro', 'CountyName', 'SizeRank'],
value_vars = dates,
value_name = 'Value',
var_name =['Date'])
Any guidance would be much appreciated.
Upvotes: 2
Views: 3136
Reputation: 107747
Often, I have advised in Pandas and R, to always use a single collection of many similar objects rather than flood your global environment with separate variables. This allows you to manage one large object instead of many of whose names you may forget with an open-ended count. Imagine navigating through 10,000 data frames! Better to navigate 10,000 indices of one list or 10,000 keys of one dictionary.
You lose zero functionality of the DataFrame
if you store elements within a list, dictionary, even named tuple. Plus if ever you need to run operations across all, you have a container for such operations. Below shows building of various collections of melted data frames (using a defined function) with demonstration of DataFrame
methods even bulk operations like pd.concat
.
Function
def melt_data(df):
mdf = pd.melt(df, id_vars=['RegionName', 'State', 'Metro', 'CountyName', 'SizeRank'],
value_vars = dates, value_name = 'Value', var_name = 'Date')
return mdf
List
Rental_list = [cityrentalprice1BR, cityrentalprice2BR, cityrentalprice3BR, cityrentalprice4BR,
cityrentalprice5BR, cityrentalpriceCondo, cityrentalpriceDupTri]
# LIST COMPREHENSION
df_list = [melt_data(size) for size in Rental_list]
df_list[0].shape()
df_list[1].head()
df_list[2].tail()
df_list[3].sum()
df_list[4].mean()
# APPEND ALL
master_df = pd.concat(df_list, ignore_index)
Dictionary
Rental_dict = {'1BR': cityrentalprice1BR, '2BR': cityrentalprice2BR, '3BR': cityrentalprice3BR,
'4BR': cityrentalprice4BR, '5BR': cityrentalprice5BR,
'Condo': cityrentalpriceCondo, 'DupTri': cityrentalpriceDupTri}
# DICTIONARY COMPREHENSION
df_dict = {i+'_melt': melt_data(size) for i, size in Rental_dict.items()}
df_dict['1BR_melt'].shape()
df_dict['2BR_melt'].head()
df_dict['3BR_melt'].tail()
df_dict['Condo_melt'].sum()
df_dict['DupTri_melt'].mean()
# APPEND ALL
master_df = pd.concat(df_dict, ignore_index)
Named tuple
from collections import namedtuple
melt_dfs = namedtuple('melt_dfs', 'OneBR TwoBR ThreeBR FourBR FiveBR Condo DupTri')
df_tuple = melt_dfs(*[melt_data(size) for size in Rental_list])
# df_tuple = melt_dfs(*[melt_data(size) for i, size in Rental_dict.items()])
df_tuple.OneBR.shape()
df_tuple.TwoBR.head()
df_tuple.ThreeBR.tail()
df_tuple.FourthBR.sum()
df_tuple.FourthBR.mean()
# APPEND ALL
master_df = pd.concat(df_tuple, ignore_index=True)
Separate Variables :(
Nonetheless, if you really, really need separate objects, simply unpack from list of melted dataframes:
cityrentalprice1BR_melt, cityrentalprice2BR_melt, cityrentalprice3BR_melt, \
cityrentalprice4BR_melt, cityrentalprice5BR_melt, \
cityrentalpriceCondo_melt, cityrentalpriceDupTri_melt = [melt_data(size) for size in Rental_list]
Upvotes: 8