Reputation: 13
How to concat unique values of some text columns of a pandas dataframe in to a single column. For example:
data = [[1,"US","California","Los Angeles"],
[1,"US","California","San Francisco"],
[1,"US","California","San Diego"],
[1,"US","Texas","Austin"],
[2,"IND","Maharashtra","Mumbai"],
[2,"IND","Maharashtra","Pune"],
[2,"IND","Maharashtra","Nagpur"]]
df = pd.DataFrame(data, columns = ['Country_Id', 'Country','State','Place'])
From above dataframe, how do I generate output with one field as Country_Id
and second with a text field containing the unique values of Country
, State
, Place
.
Like:
Please ignore the meaning of the combined text field
Upvotes: 1
Views: 134
Reputation: 25239
Use groupby
and apply
with double join
on unique
and genexp
df.groupby('Country_Id').apply(lambda x: ' '.join(' '.join(x[col].unique()) for col in x))
.to_frame('Country-State-Place')
Out[434]:
Country-State-Place
Country_Id
1 US California Texas Los Angeles San Francisco San Diego Austin
2 IND Maharashtra Mumbai Pune Nagpur
Upvotes: 2