Reputation: 521
I've the following column:
column
0 10
1 10
2 8
3 8
4 6
5 6
My goal is to find the today unique values (3 in this case) and create a new column which would create the following
new_column
0 3
1 3
2 2
3 2
4 1
5 1
The numbering starts from length of unique values (3) and same number is repeated if current row is same as previous row based on original column. Number gets decreased as row value changes. All unique values in original column have same number of rows (2 rows for each unique value in this case).
My solution was to groupby the original column and create a new list like below:
i=1
new_time=[]
for j, v in df.groupby('column'):
new_time.append([i]*2)
i=i+1
Then I'd flatten the list sort in decreasing order. Any other simpler solution?
Thanks.
Upvotes: 4
Views: 1017
Reputation: 294218
pd.factorize
i, u = pd.factorize(df.column)
df.assign(new=len(u) - i)
column new
0 10 3
1 10 3
2 8 2
3 8 2
4 6 1
5 6 1
dict.setdefault
d = {}
for k in df.column:
d.setdefault(k, len(d))
df.assign(new=len(d) - df.column.map(d))
Upvotes: 6
Reputation: 25239
IIUC, you want groupID of same-values consecutive groups in reversed order. If so, I think this should work too:
df.column.nunique() - df.column.ne(df.column.shift()).cumsum().sub(1)
Out[691]:
0 3
1 3
2 2
3 2
4 1
5 1
Name: column, dtype: int32
Upvotes: 1
Reputation: 30605
Acutally, we can use rank
with method being dense
i.e
dense: like ‘min’, but rank always increases by 1 between groups
df['column'].rank(method='dense')
0 3.0
1 3.0
2 2.0
3 2.0
4 1.0
5 1.0
rank
version of @cs95's solution would be
df['column'].ne(df['column'].shift()).cumsum().rank(method='dense',ascending=False)
Upvotes: 3
Reputation: 323226
Try with unique
and map
df.column.map(dict(zip(df.column.unique(),reversed(range(df.column.nunique())))))+1
Out[350]:
0 3
1 3
2 2
3 2
4 1
5 1
Name: column, dtype: int64
Upvotes: 2
Reputation: 402303
Use GroupBy.ngroup
with ascending=False
:
df.groupby('column', sort=False).ngroup(ascending=False)+1
0 3
1 3
2 2
3 2
4 1
5 1
dtype: int64
For DataFrame that looks like this,
df = pd.DataFrame({'column': [10, 10, 8, 8, 10, 10]})
. . .where only consecutive values are to be grouped, you'll need to modify your grouper:
(df.groupby(df['column'].ne(df['column'].shift()).cumsum(), sort=False)
.ngroup(ascending=False)
.add(1))
0 3
1 3
2 2
3 2
4 1
5 1
dtype: int64
Upvotes: 5