Reputation: 1047

slice pandas dataframe to get noncontiguous columns

I have a pandas.DataFrame: wordvecs_df, with columns labeled 'word', 'count', 'v1' through 'v50' and 'norm1' through 'norm50' in that order. I want to create a new pandas df with just the columns for 'word', 'count' and norm1-norm50.

wordvecs_df.loc[:,"norm1":"norm50"]

gets me norm1-norm50, but if I try to put in word and count I get an IndexingError: Too many indexers.

I can't figure out how to get just the columns I want out of the dataframe. Any ideas?

Upvotes: 2

Answers (4)

piRSquared

Reputation: 294338

Setup
Let's use a smaller example

i = [0, 1]
c = range(1, 5)
wordvecs_df = pd.concat([
    pd.DataFrame(1, i, ['word', 'count']),
    pd.DataFrame(1, i, c).add_prefix('norm'),
    pd.DataFrame(1, i, c).add_prefix('v')
], axis=1)

wordvecs_df

   word  count  norm1  norm2  norm3  norm4  v1  v2  v3  v4
0     1      1      1      1      1      1   1   1   1   1
1     1      1      1      1      1      1   1   1   1   1

Solution
Use pd.DataFrame.filter to grab all things that look like 'norm'

wordvecs_df.filter(regex='^norm\d\d?')

   norm1  norm2  norm3  norm4
0      1      1      1      1
1      1      1      1      1

We can tack it on to our other two columns via pd.DataFrame.join or pd.concat

wordvecs_df[['word', 'count']].join(
    wordvecs_df.filter(regex='^norm\d\d?'))

   word  count  norm1  norm2  norm3  norm4
0     1      1      1      1      1      1
1     1      1      1      1      1      1

pd.concat([
    wordvecs_df[['word', 'count']],
    wordvecs_df.filter(regex='^norm\d\d?')
], axis=1)

   word  count  norm1  norm2  norm3  norm4
0     1      1      1      1      1      1
1     1      1      1      1      1      1

Upvotes: 0

THN

Reputation: 3621

Stephen's answer works well, but in case the column names do not have nice format, the general solutions would be getting the column location, building the slice index, then accessing using iloc.

E.g.:

word_loc = wordvecs_df.columns.get_loc('word')
count_loc = wordvecs_df.columns.get_loc('count')
norm1_loc = wordvecs_df.columns.get_loc('norm1')
norm50_loc = wordvecs_df.columns.get_loc('norm50')

slice = np.r_[word_loc, count_loc, norm1_loc:norm50_loc]

df.iloc[:, slice]

Upvotes: 3

Allen Qin

Reputation: 19947

You can use pd.concat:

pd.concat([df[['word','count']], df.loc[:,'norm1':'norm50']], 1)

Upvotes: 0

Stephen Rauch

Reputation: 49814

You can build up a list of column names like:

columns = ['word', 'count'] + ['norm%d' % i for i in range(1, 51)]
wordvecs_df.loc[:,columns]

Upvotes: 5

slice pandas dataframe to get noncontiguous columns

Answers (4)

Related Questions