Nicole
Nicole

Reputation: 23

Most Efficient way to get index of item within a group in pandas dataframe

I have a data set which is in a long format, and I need to convert it to wide format, but I first need to create the column names. This original dataset looks like this:

Item    Date        Cost
----   -----------  -----
1102    10-12-2011  $1.00
1102    10-18-2011  $8.00
1102    10-15-2011  $11.00
1103    10-16-2011  $3.00
1104    11-18-2011  $7.00
1104    11-15-2011  $5.00

And I want to transform the dataset to look like this:

Item    charge_0  charge_1  charge_2
----    --------  --------  --------
1102     1.00       11.00     8.00
1103     3.00       NaN       NaN
1104     5.00       7.00      NaN

The charge number (column names) indicates the order in which the charges were received for each item (earliest date first).

I've gotten some code to work, but it's really slow, and inelegant. The data set I'm working with is a couple million rows, and I need to use the code on a much larger dataset, so efficiency is important.

# Sort dataframe by Item and Date
df_sorted=df.sort_values['Item','Date'], ascending=[1,1])
df_sorted.reset_index(drop=True, inplace=True)

# Get the order or charges (based on date) for each item
df_cost=df_sorted.groupby('Item').apply(lambda x:x['Cost']     
                                        .reset_index()).reset_index()
df_cost['colName']='charge_' + df_cost['level_1'].astype(str)

# Transform data from long to wide format
df_long=df_cost.pivot(index='Item', columns='colName', values='Cost')

Any suggestions or advice would be greatly appreciated. Thanks, Nicole

Upvotes: 1

Views: 586

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375485

You can get the charge number from cumcount:

In [11]: df["charge"] = df.groupby(["Item"]).cumcount()

In [12]: df.pivot_table("Cost", index="Item", columns="charge", aggfunc=lambda x: x)
Out[12]:
charge      0      1       2
Item
1102    $1.00  $8.00  $11.00
1103    $3.00    NaN     NaN
1104    $7.00  $5.00     NaN

Upvotes: 1

Related Questions