Blazej Kowalski
Blazej Kowalski

Reputation: 367

clipping dataframe in python

I would like to create new dataframe out of the old one in a way that there will only be values that exceed the mean value of the column. The problem for me is that of course each column has different mean and I do not want to calculate it separately and then clip each column with the different value. I tried double loop because there is different number of rows and columns but with no success. for example I have the following dataframe:

a  b  c

4  5  6
1  2  3
7  9  2
3  6  8

I calculate the mean for every column and then I want to create new dataframe with values bigger than the mean for the respective column so:

a1  b1  c1

4   9   6
7   6   8

I am not even sure if this is possible because it may happen that the columns in the new dataframe will have different dimensions but maybe we can fill the missing entries with NaN? I am not sure what the right solution should be.

Upvotes: 1

Views: 402

Answers (1)

jezrael
jezrael

Reputation: 862581

You can compare values and then add NaNs by indexing or where:

df = df[df > df.mean()]

Or:

df = df.where(df > df.mean())

print (df)
     a    b    c
0  4.0  NaN  6.0
1  NaN  NaN  NaN
2  7.0  9.0  NaN
3  NaN  6.0  8.0

If want remove NaNs also in first rows add custom function with dropna:

df = df[df > df.mean()].apply(lambda x: pd.Series(x.dropna().values))
print (df)
     a    b    c
0  4.0  9.0  6.0
1  7.0  6.0  8.0

Generally if in some column is less values get NaNs in end:

print (df)
   a  b  c
0  4  5  6
1  1  2  3
2  7  9  2
3  3  6  8
4  3  6  8

print (df[df > df.mean()])
     a    b    c
0  4.0  NaN  6.0
1  NaN  NaN  NaN
2  7.0  9.0  NaN
3  NaN  6.0  8.0
4  NaN  6.0  8.0

df = df[df > df.mean()].apply(lambda x: pd.Series(x.dropna().values))
print (df)
     a    b    c
0  4.0  9.0  6.0
1  7.0  6.0  8.0
2  NaN  6.0  8.0

Upvotes: 3

Related Questions