Reputation: 621
I have data as follows. First 2 columns are strings and columns names of columns df[3:60] as consecutive Year. How can I extract all the columns with years between 2005: 2010 and 2015 to everything
Country Indicator 1960 1961 1962 1963.....
Aruba US$ 15678 156789 156790 156791
Afgha US$ 68239 78239 88239 98239
Angola US$ 45678 55678 65678 75678
Albania US$ 89345 99345 109345 119345
Andorra US$ 62790 72790 82790 92790
Arab US$ 12987 22987 32987 42987
UAE US$ 6047 16047 26047 36047
I tried extracting index of columns
df.index.get_loc('2005') <- 45
df.index.get_loc('2010') <- 50
df.index.get_loc('2015') <- 55
df.iloc[:, [45:50,55:]]
The above code shows an error. How can I extract multiple columns with index ranging
Upvotes: 1
Views: 334
Reputation: 28644
I think @anky's use of np.r_ is the right way to go and pretty flexible, this answer is just an alternative, using pandas builtin index method:
NB: i'm using @anky's example dataframe:
df = pd.DataFrame(columns=list('ab') +
[*map(str,pd.date_range('2000','2021',freq='y').year)])
use slice_indexer to get the slice locations of the interested values :
A = df.columns.slice_indexer('2005','2010')
A
slice(7, 13, None)
#if one entry is included, it includes the location of the last index
B = df.columns.slice_indexer('2015')
B
slice(17, 23, None)
add the iloc indexing of both A and B:
res = df.iloc[:,A] + df.iloc[:,B]
res
Index(['2005', '2006', '2007', '2008', '2009', '2010', '2015', '2016', '2017',
'2018', '2019', '2020'],
dtype='object')
also note that @anky's method will be more efficient, since the iloc is called only once. again, this is just a play with the available methods
of course, you can combine np.r_ with the slices of A and B:
res = df.iloc[:,np.r_[A,B]]
res.columns
Index(['2005', '2006', '2007', '2008', '2009', '2010', '2015', '2016', '2017',
'2018', '2019', '2020'],
dtype='object')
Upvotes: 1
Reputation: 75080
You can use np.r_
:
a = df.columns.get_loc('2005')
b = df.columns.get_loc('2010')
c = df.columns.get_loc('2015')
df.iloc[:,np.r_[a-1:b,c-1:len(df.columns)]]
Example:
df = pd.DataFrame(columns=list('ab') +
[*map(str,pd.date_range('2000','2021',freq='y').year)])
print(df)
Empty DataFrame
Columns: [a, b, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012,
2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
Index: []
print(df.iloc[:,np.r_[a-1:b,c-1:len(df.columns)]])
Empty DataFrame
Columns: [2005, 2006, 2007, 2008, 2009, 2010, 2015, 2016, 2017, 2018, 2019, 2020]
Index: []
Upvotes: 2