Reputation: 359
I've currently switched my focus from R to Python. I work with data.table in R a lot, and I find it sometimes quite difficult to find an equivalent for some functions in Python.
I have a pandas data frame that looks like this:
df = pd.DataFrame({'A':['abc','def', 'def', 'abc', 'def', 'def','abc'],'B':[13123,45,1231,463,142131,4839, 4341]})
A B 0 abc 13123 1 def 45 2 def 1231 3 abc 463 4 def 142131 5 def 4839 6 abc 4341
I need to create a column that increments from 1 based on A and B, so that it indicates the increasing order of B. So I first create the sorted data frame, and the column I'm interested in creating is C as below:
A B C 1 abc 463 1 6 abc 4341 2 0 abc 13123 3 3 def 45 1 2 def 1231 2 5 def 4839 3 4 def 142131 4
In R, using the library(data.table), this can be easily done in one line and creates a column within the original data table:
df[, C := 1:.N, by=A]
I've looked around and I think I might be able to make use of something like this:
df.groupby('A').size()
or
df['B'].argsort()
but not sure how to proceed from here, and how to join the new column back to the original data frame. It would be very helpful if anyone could give me any pointer.
Many thanks!
Upvotes: 6
Views: 3419
Reputation: 4775
Index magic seems to be another way:
df['C']=df.sort(['A','B'],inplace=True).groupby('A').reset_index().index.labels[1]
Upvotes: 0
Reputation: 69136
In [61]: df
Out[61]:
A B
1 abc 463
6 abc 4341
0 abc 13123
3 def 45
2 def 1231
5 def 4839
4 def 142131
In [62]: df['C'] = df.groupby('A')['A'].transform(lambda x: pd.Series(range(1, len(x)+1), index=x.index))
In [63]: df
Out[63]:
A B C
1 abc 463 1
6 abc 4341 2
0 abc 13123 3
3 def 45 1
2 def 1231 2
5 def 4839 3
4 def 142131 4
Upvotes: 6
Reputation: 59612
And for comparison, the correct data.table
syntax is :
df[, C := 1:.N, by=A]
This adds a new column C by reference to df
. The :=
operator is part of the data.table
package for R. It allows you to add and remove columns and assign to subsets of data.table
, by group, by reference with no copy at all.
Upvotes: 1