Reputation: 369
Suppose I have Panda DataFrame which looks following:
Name Key Val
David A 1
Roe B 2
John A 3
Nat B 4
I want to split by Key and Group by Name.
Name A B
David 1 nan
John 3 nan
Nat nan 4
Roe nan 2
Can you please suggest a way to do it?
Below is code to generate the dataframe.
import pandas as pd
# Initializing the nested list with Data-set
data = [['David','A',1],
['Roe','B',2],
['John','A',3],
['Nat','B',4]]
df = pd.DataFrame(data, columns=['Name', 'Key','Val'])
Upvotes: 2
Views: 66
Reputation: 369
import pandas as pd
data = {"Id": ["DA0445EA", "DA0445EA", "DA0445EA", "DA0445EA", "DA0445EA"],
"Port" : [1,1,1,1,1],
"Lane" :[None,None,None,None,None],
"Key" : ["start", "start","case","case_start","end"],
"Val":[0.1,0.1,0.3,0.4,0.5]
}
df = pd.DataFrame(data,columns=['Id', 'Port','Lane','Key','Val'])
df1 = df.set_index(['Id', 'Port','Lane','Key'])['Val'].unstack()
df1 = df1.reset_index()
Below is Error generated. The reason is "There is two duplicate entry for same id". Such entry is unavoidable for datatable I'm working on. I was expecting, It will overwrite same line in table and won't throw any error like below.
raise ValueError('Index contains duplicate entries, '
ValueError: Index contains duplicate entries, cannot reshape
Is there a way to avoid such error during Re-shaping?
Upvotes: 0
Reputation: 369
Example input data:
data = {"Id": ["DA0445EA", "DA0445EA", "DA0445EA", "DA0445EA", "DA0445EA"],
"Port" : [1,1,1,1,1],
"Lane" :[None,None,None,None,None],
"Key" : ["start_now", "start","case","case_start","end"],
#"val": [0.000001,0.2,0.3,0.4,0.5]
"Val":[0.000001,0.2,0.3,0.4,0.5] #need to capitalize this 'val' to match dataframe definition below
}
df = pd.DataFrame(data,columns=['Id', 'Port','Lane','Key','Val'])
df1 = df.set_index(['Id', 'Port','Lane','Key'])['Val'].unstack()
df1 = df1.reset_index()
Exapected output:
Id | Port | Lane | start_now | start | case | case_start | end |
---|---|---|---|---|---|---|---|
DA0445EA | 1 | nan | 0.000001 | 0.2 | 0.3 | 0.4 | 0.5 |
Issue I'm getting:
Upvotes: 0
Reputation: 153460
Use reshaping like this, set_index
with Name and Key creating a multiIndex, then unstack
the inner most index level to create columns:
df.set_index(['Name','Key'])['Val'].unstack()
Output:
Key A B
Name
David 1.0 NaN
John 3.0 NaN
Nat NaN 4.0
Roe NaN 2.0
Upvotes: 1