Pylander
Pylander

Reputation: 1591

PANDAS Combine Rows And Preserve Column Order

I have a "long" format pandas dataframe of the following general structure:

id,date,color,size,density
1,201201,val1,val2,val3
1,201301,val1,val2,val3
1,201301,val1,val2,val3
2,201201,val1,val2,val3
2,201202,val1,val2,val3

The new "wide" format I am looking to create is this:

id,color_1,size_1,density_1,color_2,size_2,density_2,color_3,size_3,density_3
1,val1,val2,val3,val1,val2,val3,val1,val2,val3
2,val1,val2,val3,val1,val2,val3

Where the original row order of columns are preserved but they are now put in order of ascending date in single rows by id. When I try pd.pivot variations it does not preserve the column order. Perhaps a concat approach? Any advice is welcome.

UPDATE:

I've made some progress on this so here is my new base dataframe:

id, date, feature_vector (parens for clarity, not in data, comma seperated string field)
1,2012-01-01,(0,1,0,0,0,1)
1,2013-01-01,(0,0,1,0,0,1)
1,2013-01-02,(0,1,0,1,0,1)
2,2012-01-11,(0,1,0,0,1,1)
2,2012-02-11,(0,1,1,0,0,1)

I'm trying to create the following:

id, feature_vector
1,(0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,1,0,1)
2,(0,1,0,0,1,1,0,1,1,0,0,1)

I'm just trying to concatenate the feature vectors in date order now.

Upvotes: 1

Views: 174

Answers (1)

Matthew Emery
Matthew Emery

Reputation: 77

You could use the concat method, but I tried making your long dataframe and found it unwieldy and fragile even in your toy example. I would suggest using the groupby method.

grouped = df.sort('date', ascending=True).groupby('id')

If you need the concatenated version, try this:

columns = ['date', 'color', 'size', 'density']

first = grouped.nth(0)
first = first[columns]
first.rename(columns=lambda x: '{}_1'.format(x), inplace=True)

second = grouped.nth(1)
second = second[columns]
second.rename(columns=lambda x: '{}_2'.format(x), inplace=True)

new_df = pd.concat([first, second], axis=1)

Upvotes: 1

Related Questions