user308827
user308827

Reputation: 21991

Extracting columns from pandas dataframe without hard coding

Is there a way to extract a subset of columns from a pandas dataframe without specifying all of the columns. e.g. I have dataframe with foll. columns: str_ID, num_ID, 1990, 1991, 1992, 1993, 1994, 1995 and I want to extract columns from 1990 onwards. How do I do that without hard coding it?

df.columns.values
array(['str_ID', 'num_ID', 1990, 1991, 1992, 1993, 1994, 1995], dtype=object)

Upvotes: 1

Views: 473

Answers (2)

Joe
Joe

Reputation: 12417

Another option if the header are strings and there are no years before 1900:

df = pd.DataFrame({'str_ID':[4,2,4,5,5,4],
               'num_ID': [4,2,4,5,5,4],
               '1990':[4,3,1,2,2,4],
               '1991':[1,2,4,5,5,3],
               '1992':[4,3,2,2,2,4],
               '1993':[4,3,2,2,2,4]})
print df
   1990  1991  1992  1993  num_ID  str_ID
0     4     1     4     4       4       4
1     3     2     3     3       2       2
2     1     4     2     2       4       4
3     2     5     2     2       5       5
4     2     5     2     2       5       5
5     4     3     4     4       4       4

columns = [x for x in df.columns if (x>=1990 and x.isdigit())]
df = df[columns]
print df

Output:

   1990  1991  1992  1993
0     4     1     4     4
1     3     2     3     3
2     1     4     2     2
3     2     5     2     2
4     2     5     2     2
5     4     3     4     4

Upvotes: 1

Alexander
Alexander

Reputation: 109626

You can use a conditional comprehension on the columns of the dataframe (assumes the column titles for the years are integers):

df[sorted(col for col in df if isinstance(col, int) and col >= 1990)]

This filters for integer columns greater than or equal to 1990 and returns the result in a sorted order.

Upvotes: 2

Related Questions