Souvik Ray
Souvik Ray

Reputation: 3038

Selecting column values of a dataframe which are in a range and put them in appropriate columns of another dataframe in pandas

I have a csv file which is something like below

date,mean,min,max,std,metric
2018-03-15,3.9999999999999964,inf,34.0,100.0,cpu
2018-03-16,0.46403712296984756,90.0,40.0,inf,mem
2018-03-17,2.32452732452731,0.0,0.0,143.2191767899579,cpu
2018-03-18,52.8571428571428523,inf,0.0,100.0,cpu
2018-03-20,0.6928406466512793,45.33,67.96,inf,mem
2018-03-22,2.8675703858185635,,0.0,119.05383697172658,cpu

I want to select those column values in a row which are > 30 and < 300 that is (30 to 300) and put those values along with date in another column of a dataframe.

So I want to get the date, values from the csv and add it into the new dataframe at appropriate columns.Something like

date        metric       stat  val
2018-02-15  cpu          max   34.0
2018-03-15  cpu          std   100.0
2018-03-16  mem          min   90.0
2018-03-16  mem          max   40.0
2018-03-17  cpu          std   143.2191767899579
2018-03-18  cpu          mean  252.8571428571428523
2018-03-18  cpu          std   100.0
....        ....         ....

What I have done so far is

df_app = df.loc[:, ['mean', 'min', 'max', 'std', 'metric']]\
            .fillna(0)

df_app['percentage_change'] = np.maximum(df_app['mean'], df_app['std'])

But I can only get the maximum value in a row and not values in a range.How can I achieve it?

Upvotes: 1

Views: 49

Answers (1)

jezrael
jezrael

Reputation: 863451

I think need melt or set_index and stack for reshape and then filter by boolean indexing with between:

df = df.melt(['date','metric'], value_name='val', var_name='stat')
df = df[df['val'].between(30, 300)].sort_values(['date','metric'])

Alternative solution:

df = (df.set_index(['date','metric'])
        .stack()
        .reset_index(name='val')
        .rename(columns={'level_2':'stat'}))
df = df[df['val'].between(30, 300)]

print (df)
          date metric  stat         val
12  2018-03-15    cpu   max   34.000000
18  2018-03-15    cpu   std  100.000000
7   2018-03-16    mem   min   90.000000
13  2018-03-16    mem   max   40.000000
20  2018-03-17    cpu   std  143.219177
3   2018-03-18    cpu  mean   52.857143
21  2018-03-18    cpu   std  100.000000
10  2018-03-20    mem   min   45.330000
16  2018-03-20    mem   max   67.960000
23  2018-03-22    cpu   std  119.053837

EDIT:

If DatetimeIndex use:

df = df.reset_index().melt(['date','metric'], value_name='val', var_name='stat')
df = df[df['val'].between(30, 300)].sort_values(['date','metric'])

df = (df.set_index('metric', append=True)
        .stack()
        .reset_index(name='val')
        .rename(columns={'level_2':'stat'}))
df = df[df['val'].between(30, 300)]

Upvotes: 1

Related Questions