Reputation: 3362
I know you can do this with a series, but I can't seem to do this with a dataframe.
I have the following:
name note age
0 jon likes beer on tuesdays 10
1 jon likes beer on tuesdays
2 steve tonight we dine in heck 20
3 steve tonight we dine in heck
I am trying to produce the following:
name note age
0 jon likes beer on tuesdays 10
1 jon likes beer on tuesdays 10
2 steve tonight we dine in heck 20
3 steve tonight we dine in heck 20
I know how to do this with string values using group by and join, but this only works on string values. I'm having issues converting the entire column of age to a string data type in the dataframe.
Any suggestions?
Upvotes: 1
Views: 143
Reputation: 862641
Use GroupBy.first
with GroupBy.transform
if want repeat first values per groups:
g = df.groupby('name')
df['note'] = g['note'].transform(' '.join)
df['age'] = g['age'].transform('first')
If need processing multiple columns - it means all numeric with first
and all strings by join you can generate dictionary by columns names with functions, pass to GroupBy.agg
and last use DataFrame.join
:
cols1 = df.select_dtypes(np.number).columns
cols2 = df.columns.difference(cols1).difference(['name'])
d1 = dict.fromkeys(cols2, lambda x: ' '.join(x))
d2 = dict.fromkeys(cols1, 'first')
d = {**d1, **d2}
df1 = df[['name']].join(df.groupby('name').agg(d), on='name')
Upvotes: 2