krisko08
krisko08

Reputation: 55

Sorting DF by a column with multiple values

In my main df, I have a column that is combined with two other columns, creating values that look like this: A1_43567_1. The first number represents a type of assessment taken, the second number being an question ID, and the final number being the question position on an assessment. I plan on creating a pivot table to have each unique value as a column to look across multiple students' selection per each item. But I want the order of the pivot to be by the Question Position, or the third value in the concatenation. Essentially this output:

    Student ID  A1_45678_1  A1_34551_2  A1_11134_3  etc....
    12345           1            0          0      
    12346           0            0          1
    12343           1            1          0

I've tried sorting my data frame by the original column I want it to be sorted by (Question Position) and then creating the pivot table, but that doesn't render the above result I'm looking for. Is there a way to sort the original concatenation values by the third value in the column? Or is it possible to sort a pivot table by the third value in each column?

Current code is:

   demo_pivot.sort(['Question Position'], ascending=True)

   demo_pivot['newcol'] = 'A' + str(interim_selection) + '_' + ,\
   demo_pivot['Item ID'].map(str) + "_" + demo_pivot['Question Position'].map(str)

   demo_pivot= pd.pivot_table(demo_pivot, index='Student ANET ID',values='Points Received',\
   columns='newcol').reset_index()

But generates this output:

    Student ID  A1_45678_1  A1_34871_7  A1_11134_15  etc....
    12345           1            0          0      
    12346           0            0          1
    12343           1            1          0

Upvotes: 3

Views: 543

Answers (1)

Zachary Cross
Zachary Cross

Reputation: 2318

The call to pd.pivot_table() returns a DataFrame, correct? If so, can you just reorder the columns of the resulting DataFrame? Something like:

def sort_columns(column_list):
    # Create a list of tuples: (question position, column name)
    sort_list = [(int(col.split('_')[2]), col) for col in column_list]

    # Sorts by the first item in each tuple, which is the question position
    sort_list.sort() 

    # Return the column names in the sorted order:
    return [x[1] for x in sort_list]

# Now, you should be able to reorder the DataFrame like so:
demo_pivot = demo_pivot.loc[:, sort_columns(demo_pivot.columns)] 

Upvotes: 2

Related Questions