Paul
Paul

Reputation: 13

Row merging over strings in a dataframe?

I have a phone directory that stores Department, Title, Email and Extension on seperate rows, the things being in common are First and Last Name. I have combined First and Last Name as a Key, and would like to merge the rows to where you would end up with a single row with the Name, Title, Department, Email and Extension.

I have tried creating a dictionary for each key, but I have not had any luck with the actual merging. This is where I am on coding it. I had to clean the data first to get the appropriate columns.

the table looks like the following:

LastName  FirstName  Department Title   Extension Email           Key
Doe       Jane       HR         Officer 0000                      Jane Doe
Doe       Jane       HR         Officer           [email protected]  Jane Doe
df = pd.read_excel("Directory.xlsx")
df = df.drop(columns = ["group_name","editable","id","contact_type","id2","account_id","server_uuid","picture",
             "dial_prefix","name","label","id3","transfer_name","value","key","primary","label4","id5",
             "type","display","group_name6"])

df = df.rename(index = str, columns = {"last_name":"Last Name","first_name":"First Name","location":"Department",
               "title":"Title","dial":"Extension","address":"Email"})

df["Key"] = df["First Name"].map(str) + " " + df["Last Name"].map(str)
LastName FirstName Department Title   Extension Email          Key  
Doe      Jane      HR         Officer 0000      [email protected] Jane Doe

Upvotes: 0

Views: 52

Answers (1)

Erfan
Erfan

Reputation: 42886

First we use DataFrame.replace to replace the whitespaces with NaN. Then use DataFrame.groupby and apply fillna with methods backfill and forwardfill to fill in your empty spaces. Finally we can use drop_duplicates to get the single row as wanted.

df['Key'] = df['FirstName'] + ' ' + df['LastName']
df.replace('', np.NaN, inplace=True)
df = df.groupby('Key').apply(lambda x: x.fillna(method='ffill').fillna(method='bfill')).drop_duplicates()

print(df)
  LastName FirstName Department    Title Extension           Email       Key
0      Doe      Jane         HR  Officer      0000  [email protected]  Jane Doe

Upvotes: 1

Related Questions