Reputation: 603
I have this pandas dataframe:
df = pd.DataFrame(
{
"col1": [1,1,2,3,3,3,4,5,5,5,5]
}
)
df
I want to add another column that says "last" if the value in col1 doesnt equal the value of col1 in the next row. This is how it should look like:
So far, I can create a column that contains True when if the value in col1 doesnt equal the value of col1 in the next row; and False otherwise:
df["last_row"] = df["col1"].shift(-1)
df['last'] = df["col1"] != df["last_row"]
df = df.drop(["last_row"], axis=1)
df
Now something like
df["last_row"] = df["col1"].shift(-1)
df['last'] = "last" if df["col1"] != df["last_row"]
df = df.drop(["last_row"], axis=1)
df
would be nice, but this is apparently the wrong syntax. How can I manage to do this?
Ultimatly, I also want to add numbers that indicate how many time a value appear before this while the last value is always marked with "last". It should look like this:
I'm not sure if this is another step in my development or if this requires a new approach. I read that if I want to loop through an array while modifying values, I should use apply(). However, I don't know how to include conditions in this. Can you help me?
Thanks a lot!
Upvotes: 2
Views: 741
Reputation: 2811
considering that the index is incremental, (1) cuncount
each group, then take (2)max
index inside each group and set the string
group = df.groupby('col1')
df['last'] = group.cumcount()
df.loc[group['last'].idxmax(), 'last'] = 'last'
#or df.loc[group.apply(lambda x: x.index.max()), 'last'] = 'last'
col1 last
0 1 0
1 1 last
2 2 last
3 3 0
4 3 1
5 3 last
6 4 last
7 5 0
8 5 1
9 5 2
10 5 last
Upvotes: 2
Reputation: 59549
Use .shift
to find where things change. Then you can use .where
to mask appropriately then .fillna
s = df.col1 != df.col1.shift(-1)
df['Update'] = df.groupby(s.cumsum().where(~s)).cumcount().where(~s).fillna('last')
col1 Update
0 1 0
1 1 last
2 2 last
3 3 0
4 3 1
5 3 last
6 4 last
7 5 0
8 5 1
9 5 2
10 5 last
As an aside, update
is a method of DataFrames, so you should avoid naming a column 'update'
Upvotes: 2
Reputation: 1811
Another possible solution.
df['update'] = np.where(df['col1'].ne(df['col1'].shift(-1)), 'last', 0)
Upvotes: 1
Reputation: 88236
Here's one way. You can obtain a cumulative count based on whether or not the next value in col1
is the same as that of the current row, defining a custom grouper, and taking the DataFrameGroupBy.cumsum
. Then add last
using a similar criteria using df.shift
:
g = df.col1.ne(df.col1.shift(1)).cumsum()
df['update'] = df.groupby(g).cumcount()
ix = df[df.col1.ne(df.col1.shift(-1))].index
# Int64Index([1, 2, 5, 6, 10], dtype='int64')
df.loc[ix,'update'] = 'last'
col1 update
0 1 0
1 1 last
2 2 last
3 3 0
4 3 1
5 3 last
6 4 last
7 5 0
8 5 1
9 5 2
10 5 last
Upvotes: 3