Reputation: 1282
I am trying to arrange duplicate data into one row using Python.
The "Original" dataframe has duplicate data.
The "Goal" is what I am trying to accomplish.
How do I go about doing this?
If I use Pandas, how would it look like?
By the way, I am getting original data from csv file.
PatientID Model# Ear SerNum FName LName PName PPhone
P99999 300 Left 1234567 John Doe Jane Doe (999) 111-2222
P99999 400 Right 2345678 John Doe Jane Doe (999) 111-2222
PID ModleL SerNumL ModelR SerNumR FName LName PName PPhone
P99999 300 1234567 400 2345678 John Doe J.Doe (999) 111-2222
Upvotes: 1
Views: 92
Reputation: 323226
This is more like a pivot
problem , so I use pivot_table
here
s=df.pivot_table(index=['PatientID','FName','LName','PName','PPhone'],columns='Ear',values=['Model#','SerNum'],aggfunc='first')
s.columns=s.columns.map(' '.join)
s.reset_index(inplace=True)
s
PatientID FName LName ... Model# Right SerNum Left SerNum Right
0 P99999 John Doe ... 400 1234567 2345678
[1 rows x 9 columns]
Upvotes: 1
Reputation: 42886
First we split our data into left
and right
. After that we use pandas.DataFrame.merge
to bring our data back together and give the correct suffixes
:
df_L = df[df.Ear == 'Left'].drop('Ear',axis=1)
df_R = df[df.Ear == 'Right'].drop('Ear', axis=1)
print(df_L, '\n')
print(df_R)
PatientID Model# SerNum FName LName PName PPhone
0 P99999 300 1234567 John Doe Jane Doe (999) 111-2222
PatientID Model# SerNum FName LName PName PPhone
1 P99999 400 2345678 John Doe Jane Doe (999) 111-2222
Now we can merge back and give the correct suffixes:
df = pd.merge(df_L, df_R.iloc[:, :3], on = 'PatientID', suffixes=['Left', 'Right'])
print(df)
PatientID Model#Left SerNumLeft FName LName PName PPhone \
0 P99999 300 1234567 John Doe Jane Doe (999) 111-2222
Model#Right SerNumRight
0 400 2345678
Upvotes: 3
Reputation: 1528
Best source is the official source:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html
You may also want to learn about multiindex, levels, etc.
I prefer join:
import pandas as pd
columns = ['PatientID', 'Model#', 'Ear', 'SerNum', 'FName', 'LName', 'PName', 'PPhone']
data = [[
'P99999', '300', 'Left', '1234567', 'John', 'Doe', 'Jane Doe', '(999) 111-2222'],
['P99999', '400', 'Right', '2345678', 'John', 'Doe', 'Jane Doe', '(999) 111-2222']]
df = pd.DataFrame(data=data, columns=columns)
df = df.set_index('PatientID')
df = df[df['Ear'] == 'Left'].drop('Ear', axis=1).join(df[df['Ear'] == 'Right'].drop('Ear', axis=1), lsuffix='_left', rsuffix='_right').reset_index()
Output:
PatientID Model#_left SerNum_left ... LName_right PName_right PPhone_right
0 P99999 300 1234567 ... Doe Jane Doe (999) 111-2222
EDIT:
1. Fixed, forgot to drop the column :)
2. Now with your data :)
Upvotes: 1