Reputation: 63
I have pandas dataframe containing 3 column: (a) Name (b) Date and (c) Height (in cm). Height is measured for different kids (Name) on different date. I wish to separate Date column so that I can see height of each kid on a particular date.
For example, Our data is like this:
df = pd.DataFrame({
"Name": ["b", "c", "a", "b", "a", "c","a", "b", "c"],
"Date": ["1", "1", "1", "2", "2", "2","3", "3", "3"],
"Height": ["10", "11", "12", "11", "14", "13","15", "12", "14"]
})
Wish to represent like this: enter image description here
How to do it?
Upvotes: 1
Views: 108
Reputation: 862581
You need use pivot
or set_index
+ unstack
for reshape:
df1 = df.pivot(index='Name', columns='Date', values='Height')
print (df1)
Date 1 2 3
Name
a 12 14 15
b 10 11 12
c 11 13 14
df1 = df.set_index(['Name', 'Date'])['Height'].unstack(fill_value=0)
print (df1)
Date 1 2 3
Name
a 12 14 15
b 10 11 12
c 11 13 14
But if get:
ValueError: Index contains duplicate entries, cannot reshape
there are duplicates in columns created new index and columns:
df = pd.DataFrame({
"Name": ["b", "b", "a", "b", "a", "c","a", "b", "c"],
"Date": ["1", "1", "1", "2", "2", "2","3", "3", "3"],
"Height": ["10", "11", "12", "11", "14", "13","15", "12", "14"]
})
print (df)
Date Height Name
0 1 10 b <-duplicate Date=1, Name=b
1 1 11 b <-duplicate Date=1, Name=b
2 1 12 a
3 2 11 b
4 2 14 a
5 2 13 c
6 3 15 a
7 3 12 b
8 3 14 c
Then need pivot_table
with some aggregate function like mean
, sum
or use groupby
+ aggregate function
+ unstack
:
But if get:
DataError: No numeric types to aggregate
it means column Height
for aggregate is not numeric, so use astype
for cast to int
or float
:
df['Height'] = df['Height'].astype(int)
df1 = df.pivot_table(index='Name',
columns='Date',
values='Height',
aggfunc='mean', fill_value=0)
print (df1)
Date 1 2 3
Name
a 12.0 14 15
b 10.5 11 12
c 0.0 13 14
df1 = df.groupby(['Name', 'Date'])['Height'].mean().unstack(fill_value=0)
print (df1)
Date 1 2 3
Name
a 12.0 14.0 15.0
b 10.5 11.0 12.0
c 0.0 13.0 14.0
For columns add reset_index
+ rename_axis
:
df1 = df1.reset_index().rename_axis(None, axis=1)
print (df1)
Name 1 2 3
0 a 12.0 14.0 15.0
1 b 10.5 11.0 12.0
2 c 0.0 13.0 14.0
Upvotes: 1
Reputation: 852
What you want is a MultiIndex data frame, say you have a df like so:
df = pd.DataFrame({
"Name": ["a", "b", "c", "a", "b", "c"],
"Date": ["1", "2", "3", "4", "5", "6"],
"Height": ["10", "15", "12", "14", "17", "13"]
})
You could actually set the index to both Name and Date, in that order:
df.set_index(["Name", "Date"], inplace=True)
Which yields this df:
Height
Name Date
a 1 10
b 2 15
c 3 12
a 4 14
b 5 17
c 6 13
Here both Name and Date are indexes, so you could use .loc on the Name column, to get a df of Dates and Heights for a particular name, for instance:
df.loc["a"]
Yields:
Height
Date
1 10
4 14
Upvotes: 0