rrai
rrai

Reputation: 43

How can I check for negative values in pandas dataframe which contains different data types?

I have a dataframe which contains mockup data as follows:

   FUND ID  FUND NAME           AMOUNT   client code  Price description   Trade Date    Trade Datetime  
0   10101   Holdings company A  10000.5   1234        124.3   abcd        2020-08-19    2020-08-19 12:30:00 
1   20202   Holdings company B  -2000.5   192         -24.2   abcd        2020-08-20    2020-08-20 12:30:00 
2   30303   Holdings company C  3000.5    123         192     NaN         2020-08-21    2020-08-21 12:30:00 
3   10101   Holdings company A  10000     1234567     5.5     NaN         2020-08-22    2020-08-22 12:30:00
4   20202   Holdings company B  10000.5   9999        3.887   abcd        2020-08-23    2020-08-23 12:30:00

As can be seen above, this dataframe has different data types and I'm struggling to create a list which labels each column True or False depending on whether the column contains any negative values or not. For the columns which contain non-numeric data I would like the list to label this by default as False.

I would like the code to be flexible so that I could apply it to different datasets which have a different number of columns if possible

The desired list based on mockup dataframe above:

negative_list = [False, False, True, False, True, False, False, False]

Or a dataframe that resembles something like this:

     col              isnegative
0    FUND ID          False
1    FUND NAME        False
2    AMOUNT           True
3    client code      False
4    Price            True
5    description      False
6    Trade Date       False
7    Trade Datetime   False

Upvotes: 2

Views: 2030

Answers (2)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use DataFrame.select_dtypes to select the numeric columns then use DataFrame.lt + DataFrame.any and use .reindex with fill_value=False:

s = df.select_dtypes(np.number).lt(0).any()\
      .reindex(df.columns, fill_value=False)\
      .rename_axis("col").reset_index(name='isnegative')

Result:

print(s)
              col  isnegative
0         FUND ID       False
1       FUND NAME       False
2          AMOUNT        True
3     client code       False
4           Price        True
5     description       False
6      Trade Date       False
7  Trade Datetime       False

Upvotes: 3

Rob Raymond
Rob Raymond

Reputation: 31166

A simple comprehension across dtypes() does it

import io
df = pd.read_csv(io.StringIO("""   FUND ID  FUND NAME           AMOUNT   client code  Price description   Trade Date    Trade Datetime  
0   10101   Holdings company A  10000.5   1234        124.3   abcd        2020-08-19    2020-08-19 12:30:00 
1   20202   Holdings company B  -2000.5   192         -24.2   abcd        2020-08-20    2020-08-20 12:30:00 
2   30303   Holdings company C  3000.5    123         192     NaN         2020-08-21    2020-08-21 12:30:00 
3   10101   Holdings company A  10000     1234567     5.5     NaN         2020-08-22    2020-08-22 12:30:00
4   20202   Holdings company B  10000.5   9999        3.887   abcd        2020-08-23    2020-08-23 12:30:00"""), sep="\s\s+", engine="python")

dfc = pd.DataFrame([{k:False if str(t)=="object" else (df.loc[:,[k]]<=0).any().all() 
               for k,t in df.dtypes.to_dict().items()} ]).T.reset_index().rename(columns={"index":"col", 0:"isnegative"})

output

                 col  isnegative
0            FUND ID       False
1          FUND NAME        True
2             AMOUNT       False
3        client code        True
4  Price description       False
5         Trade Date       False
6     Trade Datetime       False

Upvotes: 1

Related Questions