Xingfang Lee
Xingfang Lee

Reputation: 63

Python, Rearranging data in pandas

I have pandas dataframe containing 3 column: (a) Name (b) Date and (c) Height (in cm). Height is measured for different kids (Name) on different date. I wish to separate Date column so that I can see height of each kid on a particular date.

For example, Our data is like this:

df = pd.DataFrame({
"Name": ["b", "c", "a", "b", "a", "c","a", "b", "c"], 
"Date": ["1", "1", "1", "2", "2", "2","3", "3", "3"], 
"Height": ["10", "11", "12", "11", "14", "13","15", "12", "14"]

})

Wish to represent like this: enter image description here

How to do it?

Upvotes: 1

Views: 108

Answers (2)

jezrael
jezrael

Reputation: 862581

You need use pivot or set_index + unstack for reshape:

df1 = df.pivot(index='Name', columns='Date', values='Height')
print (df1)
Date   1   2   3
Name            
a     12  14  15
b     10  11  12
c     11  13  14

df1 = df.set_index(['Name', 'Date'])['Height'].unstack(fill_value=0)
print (df1)
Date   1   2   3
Name            
a     12  14  15
b     10  11  12
c     11  13  14

But if get:

ValueError: Index contains duplicate entries, cannot reshape

there are duplicates in columns created new index and columns:

df = pd.DataFrame({
"Name": ["b", "b", "a", "b", "a", "c","a", "b", "c"], 
"Date": ["1", "1", "1", "2", "2", "2","3", "3", "3"], 
"Height": ["10", "11", "12", "11", "14", "13","15", "12", "14"]
})
print (df)
  Date  Height Name
0    1      10    b <-duplicate Date=1, Name=b
1    1      11    b <-duplicate Date=1, Name=b
2    1      12    a
3    2      11    b
4    2      14    a
5    2      13    c
6    3      15    a
7    3      12    b
8    3      14    c

Then need pivot_table with some aggregate function like mean, sum or use groupby + aggregate function + unstack:

But if get:

DataError: No numeric types to aggregate

it means column Height for aggregate is not numeric, so use astype for cast to int or float:

df['Height'] = df['Height'].astype(int)

df1 = df.pivot_table(index='Name', 
                     columns='Date', 
                     values='Height', 
                     aggfunc='mean', fill_value=0)
print (df1)
Date     1   2   3
Name              
a     12.0  14  15
b     10.5  11  12
c      0.0  13  14

df1 = df.groupby(['Name', 'Date'])['Height'].mean().unstack(fill_value=0)
print (df1)
Date     1     2     3
Name                  
a     12.0  14.0  15.0
b     10.5  11.0  12.0
c      0.0  13.0  14.0

For columns add reset_index + rename_axis:

df1 = df1.reset_index().rename_axis(None, axis=1)
print (df1)
  Name     1     2     3
0    a  12.0  14.0  15.0
1    b  10.5  11.0  12.0
2    c   0.0  13.0  14.0

Upvotes: 1

Henrique Coura
Henrique Coura

Reputation: 852

What you want is a MultiIndex data frame, say you have a df like so:

df = pd.DataFrame({
    "Name": ["a", "b", "c", "a", "b", "c"], 
    "Date": ["1", "2", "3", "4", "5", "6"], 
    "Height": ["10", "15", "12", "14", "17", "13"]
})

You could actually set the index to both Name and Date, in that order:

df.set_index(["Name", "Date"], inplace=True)

Which yields this df:

          Height
Name Date       
a    1        10
b    2        15
c    3        12
a    4        14
b    5        17
c    6        13

Here both Name and Date are indexes, so you could use .loc on the Name column, to get a df of Dates and Heights for a particular name, for instance:

df.loc["a"]

Yields:

     Height
Date       
1        10
4        14

Upvotes: 0

Related Questions