Reputation: 3038
I have a csv file which is something like below
date,mean,min,max,std,metric
2018-03-15,3.9999999999999964,inf,34.0,100.0,cpu
2018-03-16,0.46403712296984756,90.0,40.0,inf,mem
2018-03-17,2.32452732452731,0.0,0.0,143.2191767899579,cpu
2018-03-18,52.8571428571428523,inf,0.0,100.0,cpu
2018-03-20,0.6928406466512793,45.33,67.96,inf,mem
2018-03-22,2.8675703858185635,,0.0,119.05383697172658,cpu
I want to select those column values in a row which are > 30 and < 300 that is (30 to 300) and put those values along with date in another column of a dataframe.
So I want to get the date, values from the csv and add it into the new dataframe at appropriate columns.Something like
date metric stat val
2018-02-15 cpu max 34.0
2018-03-15 cpu std 100.0
2018-03-16 mem min 90.0
2018-03-16 mem max 40.0
2018-03-17 cpu std 143.2191767899579
2018-03-18 cpu mean 252.8571428571428523
2018-03-18 cpu std 100.0
.... .... ....
What I have done so far is
df_app = df.loc[:, ['mean', 'min', 'max', 'std', 'metric']]\
.fillna(0)
df_app['percentage_change'] = np.maximum(df_app['mean'], df_app['std'])
But I can only get the maximum value in a row and not values in a range.How can I achieve it?
Upvotes: 1
Views: 49
Reputation: 863451
I think need melt
or set_index
and stack
for reshape and then filter by boolean indexing
with between
:
df = df.melt(['date','metric'], value_name='val', var_name='stat')
df = df[df['val'].between(30, 300)].sort_values(['date','metric'])
Alternative solution:
df = (df.set_index(['date','metric'])
.stack()
.reset_index(name='val')
.rename(columns={'level_2':'stat'}))
df = df[df['val'].between(30, 300)]
print (df)
date metric stat val
12 2018-03-15 cpu max 34.000000
18 2018-03-15 cpu std 100.000000
7 2018-03-16 mem min 90.000000
13 2018-03-16 mem max 40.000000
20 2018-03-17 cpu std 143.219177
3 2018-03-18 cpu mean 52.857143
21 2018-03-18 cpu std 100.000000
10 2018-03-20 mem min 45.330000
16 2018-03-20 mem max 67.960000
23 2018-03-22 cpu std 119.053837
EDIT:
If DatetimeIndex
use:
df = df.reset_index().melt(['date','metric'], value_name='val', var_name='stat')
df = df[df['val'].between(30, 300)].sort_values(['date','metric'])
df = (df.set_index('metric', append=True)
.stack()
.reset_index(name='val')
.rename(columns={'level_2':'stat'}))
df = df[df['val'].between(30, 300)]
Upvotes: 1