Reputation: 625
I have the following dataframe
import pandas as pd
def remove_dup(string):
temp=string.split(',')
temp=[x.strip() for x in temp]
return ','.join(set(temp))
compnaies = ['Microsoft', 'Google', 'Amazon', 'Microsoft', 'Facebook', 'Google','Google']
products = ['OS', 'Search', 'E-comm', 'X-box', 'Social Media', 'Android','Search']
df = pd.DataFrame({'company' : compnaies, 'product':products })
new_df=df.groupby('company').product.agg([('Number', 'count'), ('Product list', ', '.join)]).reset_index()
#create uniquevalues
new_df['uniquevalues']=new_df['Product list'].apply(remove_dup)
#create uniquecount
new_df['uniquecount']=new_df['uniquevalues'].str.split(',').str.len()
How to get comma seperated values in new column
i.e: Each new unique product as seperated column as shown in expected column : Expected Output:
company Number Product list uniquevalues uniquecount uniqueProduct 1 uniqueProduct 1 Count uniqueProduct2 uniqueProduct2 Count
0 Amazon 1 E-comm E-comm 1 E-comm 1
1 Facebook 1 Social Media Social Media 1 Social Media 1
2 Google 3 Search, Android, Android,Search 2 Android 1 Search 2
Search
3 Microsoft 2 OS, X-box X-box,OS 2 X-box 1 Os 1
Upvotes: 1
Views: 4610
Reputation: 862611
Use split
with expand=True
, change columns names and new column uniquecount
is count by DataFrame.count
for avoid double split
:
new_df=df.groupby('company').product.agg([('Number', 'count'),
('Product list', ', '.join)]).reset_index()
#create uniquevalues
new_df['uniquevalues']=new_df['Product list'].apply(remove_dup)
df1 = new_df['uniquevalues'].str.split(',', expand=True)
df1.columns = ['uniqueProduct{}'.format(x+1) for x in df1.columns]
new_df['uniquecount'] = df1.count(axis=1)
new_df = new_df.join(df1)
print (new_df)
company Number Product list uniquevalues uniquecount \
0 Amazon 1 E-comm E-comm 1
1 Facebook 1 Social Media Social Media 1
2 Google 3 Search, Android, Search Search,Android 2
3 Microsoft 2 OS, X-box OS,X-box 2
uniqueProduct1 uniqueProduct2
0 E-comm None
1 Social Media None
2 Search Android
3 OS X-box
If want replace None
to empty list add fillna
to last row of code:
new_df = new_df.join(df1.fillna(''))
print (new_df)
company Number Product list uniquevalues uniquecount \
0 Amazon 1 E-comm E-comm 1
1 Facebook 1 Social Media Social Media 1
2 Google 3 Search, Android, Search Search,Android 2
3 Microsoft 2 OS, X-box OS,X-box 2
uniqueProduct1 uniqueProduct2
0 E-comm
1 Social Media
2 Search Android
3 OS X-box
EDIT:
df = pd.DataFrame({'company' : compnaies, 'product':products })
def f(x):
count = x.count()
join = ','.join(x)
uniq = ','.join(x.unique())
uniqc = x.nunique()
vals = [count, join, uniq, uniqc]
names1 = ['Number','list','uniquevalues','uniquecount']
s = [y for x in list(x.value_counts().items()) for y in x]
L = ['uniqueProduct','count']
names = ['{}{}'.format(x, y) for y in range(1, len(s)//2+1) for x in L]
return pd.DataFrame([vals + s], columns=names1 + names)
new_df = (df.groupby('company')['product'].apply(f)
.reset_index(level=1, drop=True)
.reset_index()
.fillna(''))
print (new_df)
company Number list uniquevalues uniquecount \
0 Amazon 1 E-comm E-comm 1
1 Facebook 1 Social Media Social Media 1
2 Google 3 Search,Android,Search Search,Android 2
3 Microsoft 2 OS,X-box OS,X-box 2
uniqueProduct1 count1 uniqueProduct2 count2
0 E-comm 1
1 Social Media 1
2 Search 2 Android 1
3 OS 1 X-box 1
Upvotes: 2
Reputation: 75080
you entire solution at once which covers this question: How to give column names after count and joins?
df1 = df.groupby('company').product.agg([('count', 'count'), ('product', ', '.join)]).reset_index()
df1 = df1.drop('company',axis=1).join(df.groupby('company')['product'].unique().reset_index(),rsuffix='_unique')
df1['unique_values'] =[len(df1.product_unique[i]) for i in list(df1.product_unique.index)]
df1.product_unique = [(",".join(df1.product_unique[n])) for n in list(df1.product_unique.index)]
df1.join(df1.product_unique.str.split(",",expand=True))
You can then rename the columns: - df1.rename(columns={0:'Unique1',1:'Unique2'},inplace=True)
Upvotes: 1