Philippe Haumesser
Philippe Haumesser

Reputation: 647

Pandas split and concatenate list result

I have a dataframe like this:

index               int64
idline              int64
name               object
idname             object
Amount            float64
UnitPrice         float64
Qty               float64
LineTxCodeId       object
TotalAmt          float64
Number             object
CurrencyRef        object
TxnDate            object
Customer           object
CustomerId         object
DueBalance        float64
TotalTaxesRate    float64
Classname          object
ClassId            object
year                int64
client             object

I have a list of Customer with différents names. So I want to group by this data frame to have sum order by customer and years. In order to group customer with a name nearly the same, I decide to split Customer data based on the first 3 words. this is my code:

df['year'] = pd.DatetimeIndex(df['TxnDate']).year # add column year
df['client'] = df['Customer'].str.split(' ').str[:3] # add colum with 3 first word

the issue is that df['client'] become a list for each row. like that: [San, francisco, design]

I want to have a string like this: 'San Francisco design'

What should I do?

goal is to have this groupby:

df1 = df.groupby(['client']).agg({'Amount': ['sum']})

It does not work now because of client which is a list...

Thanks for helping.

Upvotes: 0

Views: 124

Answers (1)

Koralp Catalsakal
Koralp Catalsakal

Reputation: 1124

You can use the join command while assigning the 'client' column:

import pandas as pd 
df = pd.DataFrame(['San Francisco Design Company 1','San Francisco Design Company 2'],columns =['Customer'])
df['client'] = df['Customer'].str.split(' ').str[:3].str.join(' ')
print(df)
                         Customer                client
0  San Francisco Design Company 1  San Francisco Design
1  San Francisco Design Company 2  San Francisco Design

Upvotes: 1

Related Questions