Zanshin
Zanshin

Reputation: 1272

Applying an operation on a list of dataframes

From this question Select column with only one negative value I'm trying to use and change the solution to a list of dataframes and select the one that qualifies. Cannot make it work though.

In the example below I want to return the dataframe that has only one negative value or less in column 'Z'.

In this case df1.

Example;

 N = 5

 np.random.seed(0)

 df1 = pd.DataFrame(
         {'X':np.random.uniform(-3,3,N),
          'Y':np.random.uniform(-3,3,N),
          'Z':np.random.uniform(-3,3,N),
               })

 df2 = pd.DataFrame(
         {'X':np.random.uniform(-3,3,N),
          'Y':np.random.uniform(-3,3,N),
          'Z':np.random.uniform(-3,3,N),
               })

          X         Y         Z
0  0.292881  0.875365  1.750350
1  1.291136 -0.374477  0.173370
2  0.616580  2.350638  0.408267
3  0.269299  2.781977  2.553580
4 -0.458071 -0.699351 -2.573784
----------------
          X         Y         Z
0 -2.477224  2.871710  0.839526
1 -2.878690  1.794951 -2.139880
2  1.995719 -0.231124  2.668014
3  1.668941  1.683175  0.131090
4  2.220073 -2.290353 -0.512028

How could I accomplish this? Thanks in advance.

Upvotes: 1

Views: 971

Answers (3)

Alexander
Alexander

Reputation: 109546

You could just use a conditional list comprehension:

dfs = [df1, df2]
>>> [df for df in dfs if df['Z'].lt(0).sum() <= 1]
[          X         Y         Z
 0  0.292881  0.875365  1.750350
 1  1.291136 -0.374477  0.173370
 2  0.616580  2.350638  0.408267
 3  0.269299  2.781977  2.553580
 4 -0.458071 -0.699351 -2.573784]

The result is a list of each dataframe that satisfies your condition.

Upvotes: 2

cs95
cs95

Reputation: 402433

Count the number of items under 0 using sum and just yield them.

def foo(df_list):
    for df in df_list:
        if (df['Z'] < 0).sum(0) <= 1:
            yield df

df_list = [df1, df2]
for df in foo(df_list):
    print(df)

          X         Y         Z
0  0.292881  0.875365  1.750350
1  1.291136 -0.374477  0.173370
2  0.616580  2.350638  0.408267
3  0.269299  2.781977  2.553580
4 -0.458071 -0.699351 -2.573784

Upvotes: 5

bluesummers
bluesummers

Reputation: 12607

This would do

def func(dataframe_list, on_column):

    returned_list = []
    for df in dataframe_list:
        if (df[on_column] < 0).sum() <= 1:
            returned_list.append(df)

    return returned_list

In your case, call func([df1, df2], on_column='Z')

Upvotes: 0

Related Questions