Reputation: 25
I have a df
like this:
df = pd.DataFrame( {"Client" : ["Alice", "Nan", "Nan", "Mallory", "Nan" , "Bob"] ,
"Product" : ["A", "B", "C", "B", "C", "B"] } )
And I would like to reach a result like this:
Alice A, B, C
Mallory B, C
Bob B
Does anyone know how to do this using python 3?
Upvotes: 0
Views: 200
Reputation: 8033
You can do agg
function to join
items after grouping
With pandas 0.25+
df = df.replace("Nan",np.NaN).ffill()
df.groupby('Client', sort=False)['Product'].agg(Product=('Product',','.join)).reset_index()
With pandas below 0.25
df=df.replace("Nan",np.NaN).ffill()
df.groupby('Client', sort=False)['Product'].agg([('Product', ','.join)]).reset_index()
Output
Client Product
0 Alice A,B,C
1 Mallory B,C
2 Bob B
Upvotes: 1
Reputation: 833
How about something like this?
import pandas as pd
from collections import defaultdict
df = pd.DataFrame( {"Client" : ["Alice", "Nan", "Nan", "Mallory", "Nan" , "Bob"] ,
"Product" : ["A", "B", "C", "B", "C", "B"] } )
last_client = None
data = defaultdict(list)
for _, row in df.iterrows():
# id hazard a guess you want np.nan not the string compare here
if row.Client != last_client and row.Client != "Nan":
last_client = row.Client
data[last_client].append(row.Product)
print(data)
defaultdict(, {'Alice': ['A', 'B', 'C'], 'Mallory': ['B', 'C'], 'Bob': ['B']})
Upvotes: 0
Reputation: 7806
It looks like you have the output from a groupby operation (where the "Nan"'s where were the data were) You will need to put it back to that groupby status to do anything useful with it.
first turn string "Nan"'s to actual NaN's.
import numpy as np
df.replace("Nan", np.NaN, inplace=True)
then ffill can work.
df.ffill(axis=0, inplace=True)
then to get the format of the output: (here is where the magic happens)
for group, data in df.groupby(df.Client):
print(group, data.Product.tolist())
Alice ['A', 'B', 'C']
Bob ['B']
Mallory ['B', 'C']
I'll leave as homework dealing with f string formatting.
Upvotes: 0