Reputation: 471
Hi I am having a problem sorting some data in a pandas DataFrame. The format of the data is different to what I am used to, and I have no idea how to make it work. The data I want to sort is on this format:
[['2016-05-23', 'name1', 'data1'],
['2016-05-23', 'name2', 'data2'],
['2016-05-24', 'name1', 'data1'],
['2016-05-24', 'name2', 'data2'],
['2016-05-25', 'name1', 'data1'],
['2016-05-25', 'name2', 'data2'],
['2016-05-26', 'name1', 'data1'],
['2016-05-26', 'name2', 'data2'],
['2016-05-27', 'name1', 'data1'],
['2016-05-27', 'name2', 'data2']]
What I want to do is to get it sortet on date, and with the name as colum names.
[['2016-05-23, 'data1', 'data2'],
['2016-05-24', 'data1', 'data2'],
['2016-05-25', 'data1', 'data2'] and so on...]
Is it possible to do this with some specific pandas command, I would also like the solution to work with more that just the two name columns. I have no idea how to do this and would appriciate any help.
Upvotes: 3
Views: 309
Reputation: 863031
I think you need pivot
:
import pandas as pd
df = pd.DataFrame([['2016-05-23', 'name1', 'data1'],
['2016-05-23', 'name2', 'data2'],
['2016-05-24', 'name1', 'data1'],
['2016-05-24', 'name2', 'data2'],
['2016-05-25', 'name1', 'data1'],
['2016-05-25', 'name2', 'data2'],
['2016-05-26', 'name1', 'data1'],
['2016-05-26', 'name2', 'data2'],
['2016-05-27', 'name1', 'data1'],
['2016-05-27', 'name2', 'data2']], columns = ['a','b','c'])
print (df)
a b c
0 2016-05-23 name1 data1
1 2016-05-23 name2 data2
2 2016-05-24 name1 data1
3 2016-05-24 name2 data2
4 2016-05-25 name1 data1
5 2016-05-25 name2 data2
6 2016-05-26 name1 data1
7 2016-05-26 name2 data2
8 2016-05-27 name1 data1
9 2016-05-27 name2 data2
#convert column a to datetime
df['a'] = pd.to_datetime(df.a)
print (df.pivot(index='a', columns='b', values='c'))
b name1 name2
a
2016-05-23 data1 data2
2016-05-24 data1 data2
2016-05-25 data1 data2
2016-05-26 data1 data2
2016-05-27 data1 data2
And then if you need sort_values
e.g. by column name1
:
import pandas as pd
df = pd.DataFrame([['2016-05-23', 'name1', 9],
['2016-05-23', 'name2', 4],
['2016-05-24', 'name1', 5],
['2016-05-24', 'name2', 1],
['2016-05-25', 'name1', 5],
['2016-05-25', 'name2', 7],
['2016-05-26', 'name1', 10],
['2016-05-26', 'name2', 7],
['2016-05-27', 'name1', 0],
['2016-05-27', 'name2', 1]], columns = ['a','b','c'])
print (df)
a b c
0 2016-05-23 name1 9
1 2016-05-23 name2 4
2 2016-05-24 name1 5
3 2016-05-24 name2 1
4 2016-05-25 name1 5
5 2016-05-25 name2 7
6 2016-05-26 name1 10
7 2016-05-26 name2 7
8 2016-05-27 name1 0
9 2016-05-27 name2 1
print (df.pivot(index='a', columns='b', values='c').sort_values('name1'))
b name1 name2
a
2016-05-27 0 1
2016-05-24 5 1
2016-05-25 5 7
2016-05-23 9 4
2016-05-26 10 7
Sometimes pivot
doesnt work, then use pivot_table
:
print (df.pivot_table(index='a', columns='b', values='c'))
But pivot_table
uses aggfunc
, default is aggfunc=np.mean
if duplicates. Better explanation with sample is here and in docs.
Last you can reset_index
and rename_axis
(new in pandas
0.18.0
):
print (df.pivot(index='a', columns='b', values='c')
.reset_index()
.rename_axis(None, axis=1))
a name1 name2
0 2016-05-23 9 4
1 2016-05-24 5 1
2 2016-05-25 5 7
3 2016-05-26 10 7
4 2016-05-27 0 1
#pandas bellow 0.18.0
df1 = df.pivot(index='a', columns='b', values='c').reset_index()
df1.columns.name = None
print (df1)
a name1 name2
0 2016-05-23 9 4
1 2016-05-24 5 1
2 2016-05-25 5 7
3 2016-05-26 10 7
4 2016-05-27 0 1
Upvotes: 2