Reputation: 10862
I have a Series that looks like this:
1999-03-31 SOLD_PRICE NaN
1999-06-30 SOLD_PRICE NaN
1999-09-30 SOLD_PRICE NaN
1999-12-31 SOLD_PRICE 3.00
2000-03-31 SOLD_PRICE 3.00
with an index that looks like:
MultiIndex
[(1999-03-31 00:00:00, u'SOLD_PRICE'), (1999-06-30 00:00:00, u'SOLD_PRICE'),
(1999-09-30 00:00:00, u'SOLD_PRICE'), (1999-12-31 00:00:00, u'SOLD_PRICE'),...]
I don't want the second column as an index. Ideally I'd have a DataFrame with column 1 as "Date" and column 2 as "Sales" (dropping the second level of the index). I don't quite see how to reconfigure the index.
Upvotes: 41
Views: 126440
Reputation: 773
numpy.squeeze()
Having a multi-index dataframe :
import pandas as pd
df = pd.DataFrame([[1,2], [2,3]], columns=[['col1', 'col2']])
print(df.columns)
MultiIndex([('col1',), ('col2',)], )
Then "squeeze" values with np.squeeze()
:
import numpy as np
val = np.squeeze(df.values).T
df_ = pd.DataFrame(val, [x[0] for x in df.columns]).T
print(df_.columns)
Index(['col1', 'col2'], dtype='object')
Upvotes: 0
Reputation: 23011
To remove a specific level from a MultiIndex, use droplevel
. For example, to remove the second level:
H6 = H6.droplevel(1)
Then to convert the Series into a dataframe, use to_frame
.
H6 = H6.to_frame('Sales')
If you want to convert the Series into a dataframe with 2 columns, then droplevel()
+ reset_index()
or two reset_index()
calls with different arguments will do the job.
H6 = H6.droplevel(1).rename_axis('Date').reset_index(name='Sales')
# or
H6 = H6.reset_index(level=0, name='Sales').reset_index(drop=True)
Upvotes: 0
Reputation: 879083
When you use double brackets, such as
H3 = H2[['SOLD_PRICE']]
H3 becomes a DataFrame. If you use single brackets,
H3 = H2['SOLD_PRICE']
then H3 becomes a Series. If H3 is a Series, then the result you desire follows naturally:
import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2011', periods=72, freq='M')
H2 = pd.DataFrame(np.arange(len(rng)), index=rng, columns=['SOLD_PRICE'])
H3 = H2['SOLD_PRICE']
H5 = H3.resample('Q', how='count')
H6 = pd.rolling_mean(H5,4)
print(H6.head())
yields
2011-03-31 NaN
2011-06-30 NaN
2011-09-30 NaN
2011-12-31 3
2012-03-31 3
dtype: float64
Upvotes: 16
Reputation: 25652
Just call reset_index()
:
In [130]: s
Out[130]:
0 1
1999-03-31 SOLD_PRICE NaN
1999-06-30 SOLD_PRICE NaN
1999-09-30 SOLD_PRICE NaN
1999-12-31 SOLD_PRICE 3
2000-03-31 SOLD_PRICE 3
Name: 2, dtype: float64
In [131]: s.reset_index()
Out[131]:
0 1 2
0 1999-03-31 SOLD_PRICE NaN
1 1999-06-30 SOLD_PRICE NaN
2 1999-09-30 SOLD_PRICE NaN
3 1999-12-31 SOLD_PRICE 3
4 2000-03-31 SOLD_PRICE 3
There are many ways to drop columns:
Call reset_index()
twice and specify a column:
In [136]: s.reset_index(0).reset_index(drop=True)
Out[136]:
0 2
0 1999-03-31 NaN
1 1999-06-30 NaN
2 1999-09-30 NaN
3 1999-12-31 3
4 2000-03-31 3
Delete the column after resetting the index:
In [137]: df = s.reset_index()
In [138]: df
Out[138]:
0 1 2
0 1999-03-31 SOLD_PRICE NaN
1 1999-06-30 SOLD_PRICE NaN
2 1999-09-30 SOLD_PRICE NaN
3 1999-12-31 SOLD_PRICE 3
4 2000-03-31 SOLD_PRICE 3
In [139]: del df[1]
In [140]: df
Out[140]:
0 2
0 1999-03-31 NaN
1 1999-06-30 NaN
2 1999-09-30 NaN
3 1999-12-31 3
4 2000-03-31 3
Call drop()
after resetting:
In [144]: s.reset_index().drop(1, axis=1)
Out[144]:
0 2
0 1999-03-31 NaN
1 1999-06-30 NaN
2 1999-09-30 NaN
3 1999-12-31 3
4 2000-03-31 3
Then, after you've reset your index, just rename the columns
In [146]: df.columns = ['Date', 'Sales']
In [147]: df
Out[147]:
Date Sales
0 1999-03-31 NaN
1 1999-06-30 NaN
2 1999-09-30 NaN
3 1999-12-31 3
4 2000-03-31 3
Upvotes: 61