Reputation: 5450
I have a following dataframe:
In [48]: df.head(10)
Out[48]:
beat1 beat2 beat3 beat4 beat5 beat6 beat7
filename
M46_MI_RhHy61d.dat 0.7951 0.8554 0.9161 1.0789 0.6664 0.7839 0.6076
M60_MI_AH53d.dat 0.7818 0.7380 0.8657 0.9980 0.7491 0.9272 0.8781
M57_Car_AF0489d.dat 1.1040 1.1670 1.7740 1.3080 1.2190 1.0800 1.2390
F62_MI_AH39d.dat 1.2150 0.9360 0.9890 1.1960 0.8420 1.1530 1.1360
F81_MI_DM10d.dat 1.0650 1.1190 1.1330 1.2040 1.1220 1.1640 1.0600
M61_My_508d.dat 0.6963 0.7910 0.6362 0.6938 0.7410 0.7198 0.7060
M69_MI_554d.dat 1.0400 1.0890 1.0190 0.9600 1.0720 1.0870 1.0100
F78_MI_548d.dat 1.1410 1.3290 0.8620 0.0000 1.3160 1.2180 1.2870
F68_MI_AH152d.dat 1.1910 1.1170 1.1030 1.2430 1.0100 0.0000 0.0000
M46_Myo_484d.dat 0.6799 0.7278 0.6808 0.7059 0.7973 0.6956 0.6685
In some cases, some (but need not all) of the values in columns are equal to 0
and I don't know which columns would they appear in for a given row. For example, in the dataframe given above, the last two values in the second last row are zero. I want to remove such rows from the dataframe. I can do it if I know the columns in which these values would appear, however, exactly that is what I don't know. Any ideas about doing this?
Upvotes: 1
Views: 769
Reputation: 294218
IIUC:
You want to drop any row with a zero in it?
option 1
pd.DataFrame.mask
returns a dataframe with np.nan
where the boolean array argument is True
. I can then dropna
which defaults to dropping rows where there exist any null values.
df.mask(df == 0).dropna()
beat1 beat2 beat3 beat4 beat5 beat6 beat7
filename
M46_MI_RhHy61d.dat 0.7951 0.8554 0.9161 1.0789 0.6664 0.7839 0.6076
M60_MI_AH53d.dat 0.7818 0.7380 0.8657 0.9980 0.7491 0.9272 0.8781
M57_Car_AF0489d.dat 1.1040 1.1670 1.7740 1.3080 1.2190 1.0800 1.2390
F62_MI_AH39d.dat 1.2150 0.9360 0.9890 1.1960 0.8420 1.1530 1.1360
F81_MI_DM10d.dat 1.0650 1.1190 1.1330 1.2040 1.1220 1.1640 1.0600
M61_My_508d.dat 0.6963 0.7910 0.6362 0.6938 0.7410 0.7198 0.7060
M69_MI_554d.dat 1.0400 1.0890 1.0190 0.9600 1.0720 1.0870 1.0100
M46_Myo_484d.dat 0.6799 0.7278 0.6808 0.7059 0.7973 0.6956 0.6685
option 2
use loc
where all values in row are not zero
df.loc[(df != 0).all(1)]
option 3
numpy
gives a lot of efficiency. Similar concept to option 2. However, we reconstruct from scratch.
v = df.values
mask = (v != 0).all(1)
pd.DataFrame(v[mask], df.index[mask], df.columns)
naive time testing
Upvotes: 3