How to sort a dataframe using a list of column names

Question

I am using Pandas 0.22.0 on Python 2.7 with PyCharm as the IDE.

I am trying to sort multiple Data Frames using a Loop. These data frames were created from .csv files and then to be converted to xlsx using the 'xlsxwriter' in pandas.

I have created a sorting list that has all the sorting requirements inside it, so that when I run my loop, it will pick up a csv file, convert it to a data frame, 'sort it' (where I'm getting stuck), and then output the entire thing as a .xlsx file so it can be played around in MSEXCEL.

If I use df = df.sort_values(by=['SITE', 'DEPARTMENT', 'LOCATION', 'ASSET_TYPE', 'ASSET_NAME']) then there are no issues.

But, if I use this: df = df.sort_values(by=sorts[0]), the code comes crashing down.

    Traceback (most recent call last):
      File "D:/OneDrive/Programming Practice/Python/Rubaiyat/test1.py", line 55, in 
        df = df.sort_values(by=(sorts[0]))
      File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 3619, in sort_values
        k = self.xs(by, axis=other_axis).values
      File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 2335, in xs
        return self[key]
      File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2139, in __getitem__
        return self._getitem_column(key)
      File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2146, in _getitem_column
        return self._get_item_cache(key)
      File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 1842, in _get_item_cache
        values = self._data.get(item)
      File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3843, in get
        loc = self.items.get_loc(item)
      File "C:\Python27\lib\site-packages\pandas\core\indexes\base.py", line 2527, in get_loc
        return self._engine.get_loc(self._maybe_cast_indexer(key))
      File "pandas\_libs\index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
      File "pandas\_libs\index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
      File "pandas\_libs\hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
      File "pandas\_libs\hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
    KeyError: "'SITE', 'DEPARTMENT', 'LOCATION', 'ASSET_TYPE', 'ASSET_NAME'"

The entire code is as follows:

    import pandas
    import sys

    reload(sys)
    sys.setdefaultencoding('utf-8')

    reportDF = ["assetReport", "assetTypeReport", "assetStatusReport", "locationReport", "departmentReport", "siteReport",
                "userReport"]

    sheetNames = ["Asset Report", "Asset Types", "Asset Status", "Locations", "Cost Centers", "Sites", "Users"]


    columnNames = [("EPC", "Creation Date", "Modification Date", "Inventory Date", "Asset Name", "Asset Status",
                    "Asset Type", "Asset User", "Location", "Site", "Cost Center", "Description"),
                "Asset Type Name",
                ("Asset Status", "Asset Status Description"),
                ("Location Name", "EPC", "Floor", "GPS", "Capacity", "Lead Time", "Site Name"),
                "Cost Center",
                ("Site", "Country", "Postal Code", "City", "Address", "GPS"),
                ("User Name", "User Role", "First Name", "Last Name", "Email", "User Disabled?")]

    sorts = ["'SITE', 'DEPARTMENT', 'LOCATION', 'ASSET_TYPE', 'ASSET_NAME'",
            'ASSET_TYPE_NAME', 
            'ASSET_STATUS_NAME',
            "'SITE_NAME', 'LOCATION_NAME'",
            'DEPARTMENT_NAME',
            'SITE_NAME',
            'USER_NAME']

    writer = pandas.ExcelWriter('mergedSheet.xlsx')

    for i in range(0, 7):
        df = pandas.read_csv(reportDF[i], delimiter=';')
        df = df.sort_values(by=sorts[i])
        df.to_excel(writer, sheet_name=sheetNames[i], engine='xlsxwriter', header=columnNames[i], freeze_panes=(1, 0))

    writer.save()
    writer.close()

Any help or guidance will be very much appreciated. Thank You.

dmeu · Accepted Answer

you creating one string which is: "'SITE', 'DEPARTMENT', 'LOCATION', 'ASSET_TYPE', 'ASSET_NAME'".

I think it should look like this:

sorts = [['SITE', 'DEPARTMENT', 'LOCATION', 'ASSET_TYPE', 'ASSET_NAME'],
        'ASSET_TYPE_NAME', 
        'ASSET_STATUS_NAME',
        ['SITE_NAME', 'LOCATION_NAME'],
        'DEPARTMENT_NAME',
        'SITE_NAME',
        'USER_NAME']

How to sort a dataframe using a list of column names

Answers (1)

Related Questions