Reputation: 647
I have a dataframe like this:
index int64
idline int64
name object
idname object
Amount float64
UnitPrice float64
Qty float64
LineTxCodeId object
TotalAmt float64
Number object
CurrencyRef object
TxnDate object
Customer object
CustomerId object
DueBalance float64
TotalTaxesRate float64
Classname object
ClassId object
year int64
client object
I have a list of Customer with différents names. So I want to group by this data frame to have sum order by customer and years. In order to group customer with a name nearly the same, I decide to split Customer data based on the first 3 words. this is my code:
df['year'] = pd.DatetimeIndex(df['TxnDate']).year # add column year
df['client'] = df['Customer'].str.split(' ').str[:3] # add colum with 3 first word
the issue is that df['client'] become a list for each row. like that: [San, francisco, design]
I want to have a string like this: 'San Francisco design'
What should I do?
goal is to have this groupby:
df1 = df.groupby(['client']).agg({'Amount': ['sum']})
It does not work now because of client which is a list...
Thanks for helping.
Upvotes: 0
Views: 124
Reputation: 1124
You can use the join
command while assigning the 'client' column:
import pandas as pd
df = pd.DataFrame(['San Francisco Design Company 1','San Francisco Design Company 2'],columns =['Customer'])
df['client'] = df['Customer'].str.split(' ').str[:3].str.join(' ')
print(df)
Customer client
0 San Francisco Design Company 1 San Francisco Design
1 San Francisco Design Company 2 San Francisco Design
Upvotes: 1