Reputation: 21991
Is there a way to extract a subset of columns from a pandas dataframe without specifying all of the columns. e.g. I have dataframe with foll. columns:
str_ID, num_ID, 1990, 1991, 1992, 1993, 1994, 1995
and I want to extract columns from 1990
onwards. How do I do that without hard coding it?
df.columns.values
array(['str_ID', 'num_ID', 1990, 1991, 1992, 1993, 1994, 1995], dtype=object)
Upvotes: 1
Views: 473
Reputation: 12417
Another option if the header are strings and there are no years before 1900:
df = pd.DataFrame({'str_ID':[4,2,4,5,5,4],
'num_ID': [4,2,4,5,5,4],
'1990':[4,3,1,2,2,4],
'1991':[1,2,4,5,5,3],
'1992':[4,3,2,2,2,4],
'1993':[4,3,2,2,2,4]})
print df
1990 1991 1992 1993 num_ID str_ID
0 4 1 4 4 4 4
1 3 2 3 3 2 2
2 1 4 2 2 4 4
3 2 5 2 2 5 5
4 2 5 2 2 5 5
5 4 3 4 4 4 4
columns = [x for x in df.columns if (x>=1990 and x.isdigit())]
df = df[columns]
print df
Output:
1990 1991 1992 1993
0 4 1 4 4
1 3 2 3 3
2 1 4 2 2
3 2 5 2 2
4 2 5 2 2
5 4 3 4 4
Upvotes: 1
Reputation: 109626
You can use a conditional comprehension on the columns of the dataframe (assumes the column titles for the years are integers):
df[sorted(col for col in df if isinstance(col, int) and col >= 1990)]
This filters for integer columns greater than or equal to 1990 and returns the result in a sorted order.
Upvotes: 2