Reputation: 653
I'd like to concatenate two data frames, created from two lists:
import pandas as pd
import numpy as np
header_1 = ['A', 'B', -1, 3, 5, 7]
data_1 = ['X', 'Y', 1, 2, 3, 4]
d = pd.DataFrame(np.array([data_1]), columns=header_1)
header_2 = ['A', 'B', -2, 4, 5, 6]
data_2 = ['X', 'Z', 1, 2, 3, 4]
e = pd.DataFrame(np.array([data_2]), columns=header_2)
f = pd.concat([d, e])
> f
A B -1 3 5 7 -2 4 6
0 X Y 1 2 3 4 NaN NaN NaN
0 X Z NaN NaN 3 NaN 1 2 4
However, I want my numerical columns to appear in sorted order and was wondering if there an easier way than splitting off the first two columns, sorting the remaining dataframe and concatenating the two again:
ab_cols = f[['A', 'B']] # Copy of first two columns
g = f.drop(['A', 'B'], axis=1) # Removing cols from dataframe
h = g.sort_index(axis=1) # Sort remaining by column header
i = pd.concat([ab_cols, h], axis=1) # Putting everything together again
> i
A B -2 -1 3 4 5 6 7
0 X Y NaN 1 2 NaN 3 NaN 4
0 X Z 1 NaN NaN 2 3 4 NaN
I've thought about multi-indices, but I'm already using the index for something else (source of data row, not shown here), and I'm afraid a three-level multi-index might make it more complicated for slicing the dataframe later.
Upvotes: 1
Views: 219
Reputation: 29721
Steps:
Make columns as a series representation with both index and values equal to the index keys.
Using pd.to_numeric
with errors=coerce
, to properly parse numeric values and handling string values as Nans
.
Sort these values with pushing Nans
(which were string values before) on top as and when they are encountered.
Taking their corresponding indices and re-arranging the DF
based on these newly returned column labels.
c = pd.to_numeric(f.columns.to_series(), errors='coerce').sort_values(na_position='first')
f[c.index]
Upvotes: 1
Reputation: 394159
The problem as you've probably found is that currently the concatenated columns can't be sorted due to mixed str
and int
types, what you can do is filter the columns to str
and numerical
types, sort the numerical
types and then reindex
with the new column order with str types at the beginning concatenated with the sorted numerical columns:
In [30]:
numerical_cols = f.columns[f.columns.to_series().apply(lambda x: type(x) != str)]
str_cols = f.columns[f.columns.to_series().apply(lambda x: type(x) == str)]
f.reindex(columns=str_cols.union(numerical_cols.sort_values()))
Out[30]:
A B -2 -1 3 4 5 6 7
0 X Y NaN 1 2 NaN 3 NaN 4
0 X Z 1 NaN NaN 2 3 4 NaN
Upvotes: 0