profhoff
profhoff

Reputation: 1047

slice pandas dataframe to get noncontiguous columns

I have a pandas.DataFrame: wordvecs_df, with columns labeled 'word', 'count', 'v1' through 'v50' and 'norm1' through 'norm50' in that order. I want to create a new pandas df with just the columns for 'word', 'count' and norm1-norm50.

wordvecs_df.loc[:,"norm1":"norm50"]

gets me norm1-norm50, but if I try to put in word and count I get an IndexingError: Too many indexers.

I can't figure out how to get just the columns I want out of the dataframe. Any ideas?

Upvotes: 2

Views: 2514

Answers (4)

piRSquared
piRSquared

Reputation: 294338

Setup
Let's use a smaller example

i = [0, 1]
c = range(1, 5)
wordvecs_df = pd.concat([
    pd.DataFrame(1, i, ['word', 'count']),
    pd.DataFrame(1, i, c).add_prefix('norm'),
    pd.DataFrame(1, i, c).add_prefix('v')
], axis=1)

wordvecs_df

   word  count  norm1  norm2  norm3  norm4  v1  v2  v3  v4
0     1      1      1      1      1      1   1   1   1   1
1     1      1      1      1      1      1   1   1   1   1

Solution
Use pd.DataFrame.filter to grab all things that look like 'norm'

wordvecs_df.filter(regex='^norm\d\d?')

   norm1  norm2  norm3  norm4
0      1      1      1      1
1      1      1      1      1

We can tack it on to our other two columns via pd.DataFrame.join or pd.concat

wordvecs_df[['word', 'count']].join(
    wordvecs_df.filter(regex='^norm\d\d?'))

   word  count  norm1  norm2  norm3  norm4
0     1      1      1      1      1      1
1     1      1      1      1      1      1

Or

pd.concat([
    wordvecs_df[['word', 'count']],
    wordvecs_df.filter(regex='^norm\d\d?')
], axis=1)

   word  count  norm1  norm2  norm3  norm4
0     1      1      1      1      1      1
1     1      1      1      1      1      1

Upvotes: 0

THN
THN

Reputation: 3621

Stephen's answer works well, but in case the column names do not have nice format, the general solutions would be getting the column location, building the slice index, then accessing using iloc.

E.g.:

word_loc = wordvecs_df.columns.get_loc('word')
count_loc = wordvecs_df.columns.get_loc('count')
norm1_loc = wordvecs_df.columns.get_loc('norm1')
norm50_loc = wordvecs_df.columns.get_loc('norm50')

slice = np.r_[word_loc, count_loc, norm1_loc:norm50_loc]

df.iloc[:, slice]

Upvotes: 3

Allen Qin
Allen Qin

Reputation: 19947

You can use pd.concat:

pd.concat([df[['word','count']], df.loc[:,'norm1':'norm50']], 1)

Upvotes: 0

Stephen Rauch
Stephen Rauch

Reputation: 49814

You can build up a list of column names like:

columns = ['word', 'count'] + ['norm%d' % i for i in range(1, 51)]
wordvecs_df.loc[:,columns]

Upvotes: 5

Related Questions