Reputation: 13
I have a phone directory that stores Department, Title, Email and Extension on seperate rows, the things being in common are First and Last Name. I have combined First and Last Name as a Key, and would like to merge the rows to where you would end up with a single row with the Name, Title, Department, Email and Extension.
I have tried creating a dictionary for each key, but I have not had any luck with the actual merging. This is where I am on coding it. I had to clean the data first to get the appropriate columns.
the table looks like the following:
LastName FirstName Department Title Extension Email Key
Doe Jane HR Officer 0000 Jane Doe
Doe Jane HR Officer [email protected] Jane Doe
df = pd.read_excel("Directory.xlsx")
df = df.drop(columns = ["group_name","editable","id","contact_type","id2","account_id","server_uuid","picture",
"dial_prefix","name","label","id3","transfer_name","value","key","primary","label4","id5",
"type","display","group_name6"])
df = df.rename(index = str, columns = {"last_name":"Last Name","first_name":"First Name","location":"Department",
"title":"Title","dial":"Extension","address":"Email"})
df["Key"] = df["First Name"].map(str) + " " + df["Last Name"].map(str)
LastName FirstName Department Title Extension Email Key
Doe Jane HR Officer 0000 [email protected] Jane Doe
Upvotes: 0
Views: 52
Reputation: 42886
First we use DataFrame.replace
to replace the whitespaces with NaN
. Then use DataFrame.groupby
and apply fillna
with methods backfill
and forwardfill
to fill in your empty spaces. Finally we can use drop_duplicates
to get the single row as wanted.
df['Key'] = df['FirstName'] + ' ' + df['LastName']
df.replace('', np.NaN, inplace=True)
df = df.groupby('Key').apply(lambda x: x.fillna(method='ffill').fillna(method='bfill')).drop_duplicates()
print(df)
LastName FirstName Department Title Extension Email Key
0 Doe Jane HR Officer 0000 [email protected] Jane Doe
Upvotes: 1