Reputation: 88236
I have a pd.Series with duplicated indices, and each index containing a set of booleans:
FA155 False
FA155 False
FA155 False
FA155 True
FA155 True
FA155 True
FA155 True
FA155 True
FA155 False
What I'm trying to do for each different index in an efficient way, is to keep only as True the first and last True values of the sequence, and set the rest to False. There can also be False values between those that are True.
So for this sample the result would be:
FA155 False
FA155 False
FA155 False
FA155 True
FA155 False
FA155 False
FA155 False
FA155 True
FA155 False
Any help would be very appreciated.
Upvotes: 9
Views: 1381
Reputation: 323236
This is based on the diff
to get the group starting point , I am using iloc
twice since you need keep the head and tail True
df1=df.copy()
df.loc[df]=df.astype(int).diff().ne(0)[df]
df=df.iloc[::-1]
df1=df1.iloc[::-1]
df.loc[df1]+=df1.astype(int).diff().ne(0)[df1]
df=df.iloc[::-1]
Upvotes: 1
Reputation: 2062
You filter True values and then you aggregate to find the first and last values. Then you can use loc to replace those values in df.
df
is your dataframe. col
is the name of your column with True
and False
values
df["nb"] = range(df.shape[0])
df.reset_index(inplace=True)
elem = (df[df[col]==True].groupby("index")["nb"].agg({ "first_True": 'first', "last_True":"last"})).values
indexes_to_False = sum(elem.tolist(), [])
df.loc[indexes_to_False, col] = False
Then you can drop the column nb
and reindex if you wish
Upvotes: 1
Reputation: 59274
You can use loc
with idxmax
with both your original df
and your inverted df
.
This will yield the index of your first and last True
values. Just set the different indexes to False
afterwards.
For example:
z = sio("""i v
FA154 False
FA155 False
FA155 True
FA155 True
FA155 True
FA155 True
FA155 True
FA155 False
FA156 False
FA156 True
FA156 False
FA156 False
FA156 True""")
df = pd.read_table(z, delim_whitespace=True)
i v
0 FA154 False
1 FA155 False
2 FA155 True
3 FA155 True
4 FA155 True
5 FA155 True
6 FA155 True
7 FA155 False
8 FA156 False
9 FA156 True
10 FA156 False
11 FA156 False
12 FA156 True
idxmax()
Which is the same thing as getting your df
and using reset_index
. Then, get list of indexes for you first (v1
) and last (v2
) True
values:
v1 = df.groupby("i").v.idxmax().values
v2 = df[::-1].groupby("i").v.idxmax().values
And use your logic:
df.loc[v1, "v"] = True & df.loc[v1, "v"]
df.loc[v2, "v"] = True & df.loc[v2, "v"]
df.loc[~df.index.isin(np.concatenate([v1,v2])), "v"] = False
The idea behind using &
is not to accidentally set any False
values to True
.
Result:
>>> df.set_index("i")
v
i
FA154 False
FA155 False
FA155 True
FA155 False
FA155 False
FA155 False
FA155 True
FA155 False
FA156 False
FA156 True
FA156 False
FA156 False
FA156 True
Upvotes: 3