fukiburi
fukiburi

Reputation: 653

Appending two DataFrames and sorting columns with exception of first two

I'd like to concatenate two data frames, created from two lists:

import pandas as pd
import numpy as np

header_1 = ['A', 'B', -1, 3, 5, 7]
data_1 = ['X', 'Y', 1, 2, 3, 4]
d = pd.DataFrame(np.array([data_1]), columns=header_1)

header_2 = ['A', 'B', -2, 4, 5, 6]
data_2 = ['X', 'Z', 1, 2, 3, 4]
e = pd.DataFrame(np.array([data_2]), columns=header_2)

f = pd.concat([d, e])

> f
   A  B   -1    3  5    7   -2    4    6
0  X  Y    1    2  3    4  NaN  NaN  NaN
0  X  Z  NaN  NaN  3  NaN    1    2    4

However, I want my numerical columns to appear in sorted order and was wondering if there an easier way than splitting off the first two columns, sorting the remaining dataframe and concatenating the two again:

ab_cols = f[['A', 'B']]               # Copy of first two columns
g = f.drop(['A', 'B'], axis=1)        # Removing cols from dataframe
h = g.sort_index(axis=1)              # Sort remaining by column header
i = pd.concat([ab_cols, h], axis=1)   # Putting everything together again

> i
   A  B   -2   -1    3    4  5    6    7
0  X  Y  NaN    1    2  NaN  3  NaN    4
0  X  Z    1  NaN  NaN    2  3    4  NaN

I've thought about multi-indices, but I'm already using the index for something else (source of data row, not shown here), and I'm afraid a three-level multi-index might make it more complicated for slicing the dataframe later.

Upvotes: 1

Views: 219

Answers (2)

Nickil Maveli
Nickil Maveli

Reputation: 29721

Steps:

Make columns as a series representation with both index and values equal to the index keys.

Using pd.to_numeric with errors=coerce, to properly parse numeric values and handling string values as Nans.

Sort these values with pushing Nans(which were string values before) on top as and when they are encountered.

Taking their corresponding indices and re-arranging the DF based on these newly returned column labels.

c = pd.to_numeric(f.columns.to_series(), errors='coerce').sort_values(na_position='first')
f[c.index]

Image

Upvotes: 1

EdChum
EdChum

Reputation: 394159

The problem as you've probably found is that currently the concatenated columns can't be sorted due to mixed str and int types, what you can do is filter the columns to str and numerical types, sort the numerical types and then reindex with the new column order with str types at the beginning concatenated with the sorted numerical columns:

In [30]:
numerical_cols = f.columns[f.columns.to_series().apply(lambda x: type(x) != str)]
str_cols = f.columns[f.columns.to_series().apply(lambda x: type(x) == str)]
f.reindex(columns=str_cols.union(numerical_cols.sort_values()))

Out[30]:
   A  B   -2   -1    3    4  5    6    7
0  X  Y  NaN    1    2  NaN  3  NaN    4
0  X  Z    1  NaN  NaN    2  3    4  NaN

Upvotes: 0

Related Questions