Reputation: 510
I have a DataFrames like shown below:
DF1 =
sID token A B C D
10 I am a f g h
10 here a g g h
10 whats a h g h
10 going a o g h
10 on a j g h
10 . a f g h
11 I am a f g h
11 foo bar a f g h
12 You are a f g h
...
The columns (A-D) don't matter regarding this task. Is there a way to add a counter column which counts the words (delimited by white space) to the DataFrame. That column should start counting the amount of tokens for each sID
. Meaning it resets every time the value of sID
changes.
Usually I would just use DF.groupby("sID").cumcount()
but this only counts the amount of rows for each sID
.
The result should look like this:
DF2 =
sID token A B C D Counter
10 I am a f g h 0 1
10 here a g g h 2
10 whats a h g h 3
10 going a o g h 4
10 on a j g h 5
10 . a f g h 6
11 I am a f g h 0 1
11 foo bar a f g h 2 3
12 You are a f g h 0 1
...
Upvotes: 1
Views: 1523
Reputation: 164653
Using groupby
+ itertools
:
from itertools import chain, count
df = pd.DataFrame({'sID': [10, 10, 10, 10, 10, 10, 11, 11, 12],
'token': ['I am', 'here', 'whats', 'going',
'on', '.', 'I am', 'foo bar', 'You are']})
def counter(df):
for k, g in df.groupby('sID')['token']:
c = count()
lens = g.str.split().map(len)
yield [' '.join([str(next(c)) for _ in range(n)]) for n in lens]
df['Counts'] = list(chain.from_iterable(counter(df)))
Result
print(df)
sID token Counts
0 10 I am 0 1
1 10 here 2
2 10 whats 3
3 10 going 4
4 10 on 5
5 10 . 6
6 11 I am 0 1
7 11 foo bar 2 3
8 12 You are 0 1
Explanation
itertools.count
counter for each group.str.split
and len
.itertools.chain
.Upvotes: 3
Reputation: 29635
Before using groupby("sID").cumcount()
you need to do some manipulation to keep wich row the words, once split, they belongs to. So, you can create your column 'Counter' like this:
df['Counter']= (df.set_index('sID',append=True)['token']
.str.split(' ',expand=True).stack()
.groupby('sID').cumcount()
.groupby(level=0).apply(lambda x: ' '.join([str(i) for i in x])))
and you get the expected output
Upvotes: 3