Reputation: 1047
I have a pandas.DataFrame
: wordvecs_df
, with columns labeled 'word'
, 'count'
, 'v1'
through 'v50'
and 'norm1'
through 'norm50'
in that order. I want to create a new pandas df with just the columns for 'word'
, 'count'
and norm1-norm50
.
wordvecs_df.loc[:,"norm1":"norm50"]
gets me norm1
-norm50
, but if I try to put in word and count I get an IndexingError: Too many indexers.
I can't figure out how to get just the columns I want out of the dataframe. Any ideas?
Upvotes: 2
Views: 2514
Reputation: 294338
Setup
Let's use a smaller example
i = [0, 1]
c = range(1, 5)
wordvecs_df = pd.concat([
pd.DataFrame(1, i, ['word', 'count']),
pd.DataFrame(1, i, c).add_prefix('norm'),
pd.DataFrame(1, i, c).add_prefix('v')
], axis=1)
wordvecs_df
word count norm1 norm2 norm3 norm4 v1 v2 v3 v4
0 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1
Solution
Use pd.DataFrame.filter
to grab all things that look like 'norm'
wordvecs_df.filter(regex='^norm\d\d?')
norm1 norm2 norm3 norm4
0 1 1 1 1
1 1 1 1 1
We can tack it on to our other two columns via pd.DataFrame.join
or pd.concat
wordvecs_df[['word', 'count']].join(
wordvecs_df.filter(regex='^norm\d\d?'))
word count norm1 norm2 norm3 norm4
0 1 1 1 1 1 1
1 1 1 1 1 1 1
Or
pd.concat([
wordvecs_df[['word', 'count']],
wordvecs_df.filter(regex='^norm\d\d?')
], axis=1)
word count norm1 norm2 norm3 norm4
0 1 1 1 1 1 1
1 1 1 1 1 1 1
Upvotes: 0
Reputation: 3621
Stephen's answer works well, but in case the column names do not have nice format, the general solutions would be getting the column location, building the slice index, then accessing using iloc.
E.g.:
word_loc = wordvecs_df.columns.get_loc('word')
count_loc = wordvecs_df.columns.get_loc('count')
norm1_loc = wordvecs_df.columns.get_loc('norm1')
norm50_loc = wordvecs_df.columns.get_loc('norm50')
slice = np.r_[word_loc, count_loc, norm1_loc:norm50_loc]
df.iloc[:, slice]
Upvotes: 3
Reputation: 19947
You can use pd.concat:
pd.concat([df[['word','count']], df.loc[:,'norm1':'norm50']], 1)
Upvotes: 0
Reputation: 49814
You can build up a list of column names like:
columns = ['word', 'count'] + ['norm%d' % i for i in range(1, 51)]
wordvecs_df.loc[:,columns]
Upvotes: 5