user308827
user308827

Reputation: 21971

Extracting multiple rows from pandas dataframe and converting to columns

I would like to transform the foll. dataframe:

    index    YR   BIOM   RWT site
0       0  2008   0.53  0.20    1
1       1  2009   3.23  1.18    1
2       2  2010  11.51  3.94    1
3       3  2011  18.14  5.82    1
4       4  2012  22.88  6.73    1
5       5  2013  26.65  7.20    1
6       0  2008   0.39  0.15   10
7       1  2009   2.43  0.90   10
8       2  2010   8.95  3.09   10
9       3  2011  16.63  5.38   10
10      4  2012  24.36  7.23   10
11      5  2013  29.72  8.10   10

I would like to extract the BIOM values for the 2 unique site i.e. 1 and 10 for the YR 2008 and 2013 so that I get something like this:

index    BIOM_2008     BIOM_2013    site
0         0.53         26.65        1
1         26.65        29.72        10

This is what I am doing:

lst_yrs = [2008, 2013]
sub_df = df[['YR', 'BIOM', 'site']]

for yr in lst_yrs:
    sub_df['BIOM'+str(yr)] = sub_df.loc['YR' == yr] 

Not sure how to get the for loop right.

Upvotes: 1

Views: 1452

Answers (1)

Woody Pride
Woody Pride

Reputation: 13955

I am not sure you need a loop here. You can simply create a slice of the data you want, set the index and then unstack as follows:

import pandas as pd

DF = pd.DataFrame({
'site' : [1, 1, 1, 1, 1, 1, 10, 10, 10],
'BIOM' : [0.53, 3.23, 11.51, 18.14, 22.88, 26.65, 0.39, 2.43, 8.95],
'YR'   : ['2008', '2009', '2010', '2011', '2012', '2013', '2008', '2009', '2010']
                  })

slice = DF[(DF['site'].isin([1, 10]) & DF['YR'].isin(['2008', '2013']))]

result = slice.set_index(['site','YR']).unstack('YR')

Which gives the following:

      BIOM       
YR    2008   2013
site             
1     0.53  26.65
10    0.39    NaN

In response to your comment, to flatten the hierarchical column index do the following:

result.columns = [' '.join(col).strip() for col in result.columns.values]


   site  BIOM 2008  BIOM 2013
0     1       0.53      26.65
1    10       0.39        NaN

Possibly this is now more complex than your original loop idea but I think it uses the pandas functionality in a more sophisticated way.

Upvotes: 2

Related Questions