Reputation: 974
I have a table like so:
Name | ID | Contact_method | Contact
sarah 1 house h1
sarah 1 mobile m1
sarah 1 email sarah@mail
bob 2 house h2
bob 2 mobile m2
bob 2 email bob@mail
jones 3 house h3
jones 3 mobile m3
jones 3 email jones@mail
jones 4 house h4
jones 4 mobile m4
jones 4 email jones2@mail
And I want it like so:
Name | ID | house | mobile | email
sarah 1 h1 m1 sarah@mail
bob 2 h2 m2 bob@mail
jones 3 h3 m3 jones@mail
jones 4 h4 m4 jones2@mail
I can already do this, but only through a very expensive pd.concat
operation iterated over all the unique IDs. Is there a simple way to do this? I've also tinkered with pivot()
and transpose()
. Note that the duplicate name is there so that I can't rely on uniqueness of column values to, say, do a join
.
Upvotes: 2
Views: 244
Reputation: 863611
I think piRSquared's solution is very nice, but if get:
ValueError: Index contains duplicate entries, cannot reshape
print (df)
Name ID Contact_method Contact
0 sarah 1 house h1
1 sarah 1 mobile m1
2 sarah 1 email sarah@mail
3 bob 2 house h2
4 bob 2 mobile m2
5 bob 2 email bob@mail
6 jones 3 house h3
7 jones 3 mobile m3
8 jones 3 email jones@mail <-for same Name,ID and Contact_method get duplicate
9 jones 3 email joe@mail <-for same Name,ID and Contact_method get duplicate
10 jones 4 house h4
11 jones 4 mobile m4
12 jones 4 email jones2@mail
use pivot_table
or groubpy
with aggregating join
:
cols = ['Name','ID','house','mobile','email']
df1 = df.pivot_table(index=['ID','Name'],
columns='Contact_method',
values='Contact',
aggfunc=','.join)
.rename_axis(None, 1)
.reset_index()
.reindex_axis(cols, axis=1)
print (df1)
Name ID house mobile email
0 sarah 1 h1 m1 sarah@mail
1 bob 2 h2 m2 bob@mail
2 jones 3 h3 m3 jones@mail,joe@mail <- join duplicates
3 jones 4 h4 m4 jones2@mail
df1 = df.groupby(['Name', 'ID', 'Contact_method'])['Contact']
.apply(','.join)
.unstack()
.rename_axis(None, 1)
.reset_index()
.reindex_axis(cols, axis=1)
print (df1)
Name ID house mobile email
0 sarah 1 h1 m1 sarah@mail
1 bob 2 h2 m2 bob@mail
2 jones 3 h3 m3 jones@mail,joe@mail <- join duplicates
3 jones 4 h4 m4 jones2@mail
Upvotes: 0
Reputation: 294516
Set the index with all columns except 'Contact_method'
, then unstack
df.set_index(
['Name', 'ID', 'Contact_method']
)['Contact'].unstack().rename_axis(None, 1).reset_index()
Name ID email house mobile
0 bob 2 bob@mail h2 m2
1 jones 3 jones@mail h3 m3
2 jones 4 jones2@mail h4 m4
3 sarah 1 sarah@mail h1 m1
Upvotes: 2
Reputation: 1405
One way is to build a contact dictionary (of dictionaries) based on ID, 'manualy'. Not sure if it is more efficient though.
people = dict()
for index, row in pd.iterrows():
ID = row['ID']
if ID not in people:
people[ID] = {'ID': ID, 'Name': row['Name']}
people[ID][row['Contact_method']] = row['Contact']
print pandas.DataFrame(people).transpose()
And output is:
ID Name email house mobile
1 1 sarah sarah@mail h1 m1
2 2 bob bob@mail h2 m2
3 3 jones jones@mail h3 m3
4 4 jones jones2@mail h4 m4
Upvotes: 0