Reputation: 43
I have a dataframe which contains mockup data as follows:
FUND ID FUND NAME AMOUNT client code Price description Trade Date Trade Datetime
0 10101 Holdings company A 10000.5 1234 124.3 abcd 2020-08-19 2020-08-19 12:30:00
1 20202 Holdings company B -2000.5 192 -24.2 abcd 2020-08-20 2020-08-20 12:30:00
2 30303 Holdings company C 3000.5 123 192 NaN 2020-08-21 2020-08-21 12:30:00
3 10101 Holdings company A 10000 1234567 5.5 NaN 2020-08-22 2020-08-22 12:30:00
4 20202 Holdings company B 10000.5 9999 3.887 abcd 2020-08-23 2020-08-23 12:30:00
As can be seen above, this dataframe has different data types and I'm struggling to create a list which labels each column True or False depending on whether the column contains any negative values or not. For the columns which contain non-numeric data I would like the list to label this by default as False.
I would like the code to be flexible so that I could apply it to different datasets which have a different number of columns if possible
The desired list based on mockup dataframe above:
negative_list = [False, False, True, False, True, False, False, False]
Or a dataframe that resembles something like this:
col isnegative
0 FUND ID False
1 FUND NAME False
2 AMOUNT True
3 client code False
4 Price True
5 description False
6 Trade Date False
7 Trade Datetime False
Upvotes: 2
Views: 2030
Reputation: 71689
Use DataFrame.select_dtypes
to select the numeric columns then use DataFrame.lt
+ DataFrame.any
and use .reindex
with fill_value=False
:
s = df.select_dtypes(np.number).lt(0).any()\
.reindex(df.columns, fill_value=False)\
.rename_axis("col").reset_index(name='isnegative')
Result:
print(s)
col isnegative
0 FUND ID False
1 FUND NAME False
2 AMOUNT True
3 client code False
4 Price True
5 description False
6 Trade Date False
7 Trade Datetime False
Upvotes: 3
Reputation: 31166
A simple comprehension across dtypes()
does it
import io
df = pd.read_csv(io.StringIO(""" FUND ID FUND NAME AMOUNT client code Price description Trade Date Trade Datetime
0 10101 Holdings company A 10000.5 1234 124.3 abcd 2020-08-19 2020-08-19 12:30:00
1 20202 Holdings company B -2000.5 192 -24.2 abcd 2020-08-20 2020-08-20 12:30:00
2 30303 Holdings company C 3000.5 123 192 NaN 2020-08-21 2020-08-21 12:30:00
3 10101 Holdings company A 10000 1234567 5.5 NaN 2020-08-22 2020-08-22 12:30:00
4 20202 Holdings company B 10000.5 9999 3.887 abcd 2020-08-23 2020-08-23 12:30:00"""), sep="\s\s+", engine="python")
dfc = pd.DataFrame([{k:False if str(t)=="object" else (df.loc[:,[k]]<=0).any().all()
for k,t in df.dtypes.to_dict().items()} ]).T.reset_index().rename(columns={"index":"col", 0:"isnegative"})
output
col isnegative
0 FUND ID False
1 FUND NAME True
2 AMOUNT False
3 client code True
4 Price description False
5 Trade Date False
6 Trade Datetime False
Upvotes: 1