dartdog
dartdog

Reputation: 10862

Pandas selecting discontinuous columns from a dataframe

I am using the following to select specific columns from the dataframe comb, which I would like to bring into a new dataframe. The individual selects work fine EG: comb.ix[:,0:1], but when I attempt to combine them using the + I get a bad result the 1st selection ([:,0:1]) getting stuck on the end of the dataframe and the values contained in original col 1 are wiped out while appearing at the end of the row. What is the right way to get just the columns I want? (I'd include sample data but as you may see, too many columns...which is why I'm trying to do it this way)

comb.ix[:,0:1]+comb.ix[:,17:342]

Upvotes: 12

Views: 14367

Answers (3)

neves
neves

Reputation: 39233

NumPy has a nice module named r_, allowing you to solve it with the modern DataFrame selection interface, iloc:

df.iloc[:, np.r_[0:1, 17:342]]

I believe this is a more elegant solution.

It even support more complex selections:

df.iloc[:, np.r_[0:1, 5, 16, 17:342:2, -5:]]

Upvotes: 9

I recently solved it by just appending ranges

r1 = pd.Series(range(5))
r2 = pd.Series([10,15,20])
final_range = r1.append(r2)
df.iloc[:,final_range]

Then you will get columns from 0:5 and 10, 15, 20.

Upvotes: 4

EdChum
EdChum

Reputation: 394101

If you want to concatenate a sub selection of your df columns then use pd.concat:

pd.concat([comb.ix[:,0:1],comb.ix[:,17:342]], axis=1)

So long as the indices match then this will align correctly.

Thanks to @iHightower that you can also sub-select by passing the labels:

pd.concat([df.ix[:,'Col1':'Col5'],df.ix[:,'Col9':'Col15']],a‌​xis=1)

Note that .ix will be deprecated in a future version the following should work:

In [115]:
df = pd.DataFrame(columns=['col' + str(x) for x in range(10)])
df

Out[115]:
Empty DataFrame
Columns: [col0, col1, col2, col3, col4, col5, col6, col7, col8, col9]
Index: []

In [118]:
pd.concat([df.loc[:, 'col2':'col4'], df.loc[:, 'col7':'col8']], axis=1)
​
Out[118]:
Empty DataFrame
Columns: [col2, col3, col4, col7, col8]
Index: []

Or using iloc:

In [127]:
pd.concat([df.iloc[:, df.columns.get_loc('col2'):df.columns.get_loc('col4')], df.iloc[:, df.columns.get_loc('col7'):df.columns.get_loc('col8')]], axis=1)

Out[127]:
Empty DataFrame
Columns: [col2, col3, col7]
Index: []

Note that iloc slicing is open/closed so the end range is not included so you'd have to find the column after the column of interest if you want to include it:

In [128]:
pd.concat([df.iloc[:, df.columns.get_loc('col2'):df.columns.get_loc('col4')+1], df.iloc[:, df.columns.get_loc('col7'):df.columns.get_loc('col8')+1]], axis=1)

Out[128]:
Empty DataFrame
Columns: [col2, col3, col4, col7, col8]
Index: []

Upvotes: 13

Related Questions