Reputation: 382
I want to get the value of the last row when grouping by Name. For instance, the last iteration of the name Walter in row 2, I want to get Dog + ", " + Cat for Col1 and Beer + ", " + Wine in Col3. There are a lot of columns, so I would like to make it based on indexing/column position instead of column names.
+------+---------+-------+
| Col1 | Name | Col3 |
+------+---------+-------+
| Dog | Walter | Beer |
| Cat | Walter | Wine |
| Dog | Alfonso | Cider |
| Dog | Alfonso | Cider |
| Dog | Alfonso | Vodka |
+------+---------+-------+
This is the output I want:
+---------------+---------------------------+---------------------+
| Col1 | Name | Col3 |
+---------------+---------------------------+---------------------+
| Dog | Walter | Beer |
| Dog, Cat | Walter, Walter | Beer, Wine |
| Dog | Alfonso | Cider |
| Dog, Dog | Alfonso, Alfonso | Cider, Cider |
| Dog, Dog, Dog | Alfonso, Alfonso, Alfosno | Cider, Cider, Vodka |
+---------------+---------------------------+---------------------+
This is what I have tried (but does not work):
for i in df:
if df.loc[i,1] == df.loc[i+1,1]:
df.loc[i,0] + ", " + df.loc[i+1,0]
else:
df.loc[i+1,0]
I read that iterating over rows in pandas with a for-loop is frowned upon, so I would like to get the output by using vectorization or apply (or some other efficient way).
Upvotes: 1
Views: 107
Reputation: 75130
here is another way using accumulate
on the index and using df.agg
method:
from itertools import accumulate
import numpy as np
def fun(a):
l = [[i] for i in a.index]
acc = list(accumulate(l, lambda x, y: np.concatenate([x, y])))
return pd.concat([a.loc[idx].agg(','.join) for idx in acc],axis=1).T
out = pd.concat([fun(v) for k,v in df.groupby('Name',sort=False)])
print(out)
Col1 Name Col3
0 Dog Walter Beer
1 Dog,Cat Walter,Walter Beer,Wine
0 Dog Alfonso Cider
1 Dog,Dog Alfonso,Alfonso Cider,Cider
2 Dog,Dog,Dog Alfonso,Alfonso,Alfonso Cider,Cider,Vodka
You can add a reset index with drop=True
in the end to reset the indexes
Upvotes: 2
Reputation: 13898
If you only care for the last row results of Col1
and Col3
, try this:
df.groupby('Name').agg(', '.join)
Result:
Col1 Col3
Name
Alfonso Dog, Dog, Dog Cider, Cider, Vodka
Walter Dog, Cat Beer, Wine
Upvotes: 1
Reputation: 12523
What you're basically trying to do is run a commutative aggregation function on each group. Pandas have comsum
for regular addition but doesn't support custom commutative functions. For this you may want to use some numpy functions:
df = pd.DataFrame({"col1": ["D", "C", "D", "D", "D"], "Name": ["W", "W", "A", "A", "A"],
"col3": ["B", "W", "C", "C", "V"] })
import numpy as np
def ser_accum(op,ser):
u_op = np.frompyfunc(op, 2, 1) # two inputs, one output
return u_op.accumulate(ser, dtype=np.object)
def plus(x,y):
return x + "," + y
def accum(df):
for col in df.columns:
df[col] = ser_accum(plus, df[col])
return df
df.groupby("Name").apply(accum)
Here's the result:
col1 Name col3
0 D W B
1 D,C W,W B,W
2 D A C
3 D,D A,A C,C
4 D,D,D A,A,A C,C,V
Upvotes: 3
Reputation: 29635
you can use groupby
and cumsum
. If you don't mind (depending on your use after) having an extra comma/space at the end, you can do:
print (df.groupby('Name')[['Col1', 'Col3']].apply(lambda x: (x + ', ').cumsum()))
Col1 Col3
0 Dog, Beer,
1 Dog, Cat, Beer, Wine,
2 Dog, Cider,
3 Dog, Dog, Cider, Cider,
4 Dog, Dog, Dog, Cider, Cider, Vodka,
but if you want to remove the extra comma/space, just add str[:-2] to each column like:
print (df.groupby('Name')[['Col1', 'Col3']].apply(lambda x: (x + ', ').cumsum())\
.apply(lambda x: x.str[:-2]))
Col1 Col3
0 Dog Beer
1 Dog, Cat Beer, Wine
2 Dog Cider
3 Dog, Dog Cider, Cider
4 Dog, Dog, Dog Cider, Cider, Vodka
Upvotes: 2