Mâncio
Mâncio

Reputation: 25

Pandas set column values as row

I have a df like this:

df = pd.DataFrame( {"Client" : ["Alice", "Nan", "Nan", "Mallory", "Nan" , "Bob"] , 
    "Product" : ["A", "B", "C", "B", "C", "B"] } )

And I would like to reach a result like this:

Alice   A, B, C
Mallory B, C
Bob     B

Does anyone know how to do this using python 3?

Upvotes: 0

Views: 200

Answers (3)

moys
moys

Reputation: 8033

You can do agg function to join items after grouping

With pandas 0.25+

df = df.replace("Nan",np.NaN).ffill()
df.groupby('Client', sort=False)['Product'].agg(Product=('Product',','.join)).reset_index()

With pandas below 0.25

df=df.replace("Nan",np.NaN).ffill()
df.groupby('Client', sort=False)['Product'].agg([('Product', ','.join)]).reset_index()

Output

    Client  Product
0   Alice   A,B,C
1   Mallory B,C
2   Bob     B

Upvotes: 1

voglster
voglster

Reputation: 833

How about something like this?

import pandas as pd
from collections import defaultdict

df = pd.DataFrame( {"Client" : ["Alice", "Nan", "Nan", "Mallory", "Nan" , "Bob"] , 
    "Product" : ["A", "B", "C", "B", "C", "B"] } )

last_client = None
data = defaultdict(list)
for _, row in df.iterrows():
    # id hazard a guess you want np.nan not the string compare here
    if row.Client != last_client and row.Client != "Nan":
        last_client = row.Client
    data[last_client].append(row.Product)

print(data)

defaultdict(, {'Alice': ['A', 'B', 'C'], 'Mallory': ['B', 'C'], 'Bob': ['B']})

Upvotes: 0

Back2Basics
Back2Basics

Reputation: 7806

It looks like you have the output from a groupby operation (where the "Nan"'s where were the data were) You will need to put it back to that groupby status to do anything useful with it.

first turn string "Nan"'s to actual NaN's.

import numpy as np
df.replace("Nan", np.NaN, inplace=True)

then ffill can work.

df.ffill(axis=0, inplace=True)

then to get the format of the output: (here is where the magic happens)

for group, data in df.groupby(df.Client): 
    print(group, data.Product.tolist())

Alice ['A', 'B', 'C']
Bob ['B']
Mallory ['B', 'C']

I'll leave as homework dealing with f string formatting.

Upvotes: 0

Related Questions