Reputation: 2565
I have a pandas Data frame which represent a list of sentences when every row is a word and it got an ID corresponding to its location in the sentence.
It looks something like:
ID FORM
0 1 A
1 2 word
2 3 in
3 4 the
4 5 first
5 6 sentence
6 7 .
7 1 The
8 2 second
9 3 sentence
10 4 .
11 1 the
12 2 third
13 3 sentence
...
How can I add an extra column named "Sentence" which will correspond to which sentence the given word is belong and my Data frame would look like that:
ID FORM Sentence
0 1 A 1
1 2 word 1
2 3 in 1
3 4 the 1
4 5 first 1
5 6 sentence 1
6 7 . 1
7 1 The 2
8 2 second 2
9 3 sentence 2
10 4 . 2
11 1 the 3
12 2 third 3
13 3 sentence 3
I can make it done by iterating the data frame and create a series manually, but it looks ugly and not very pytonic. Is there a nice way to use pandas to do it for me?
Upvotes: 2
Views: 63
Reputation: 28644
I would use the ID
position of 1, along with cumsum
to get the sentence positions:
df.assign(Sentence=df.ID.eq(1).cumsum())
ID FORM Sentence
0 1 A 1
1 2 word 1
2 3 in 1
3 4 the 1
4 5 first 1
5 6 sentence 1
6 7 . 1
7 1 The 2
8 2 second 2
9 3 sentence 2
10 4 . 2
11 1 the 3
12 2 third 3
13 3 sentence 3
Upvotes: 1
Reputation: 323226
Let us try shift
with cumsum
df['st']=df['FORM'].eq('.').shift().cumsum().fillna(0)+1
df
Out[385]:
ID FORM st
0 1 A 1.0
1 2 word 1.0
2 3 in 1.0
3 4 the 1.0
4 5 first 1.0
5 6 sentence 1.0
6 7 . 1.0
7 1 The 2.0
8 2 second 2.0
9 3 sentence 2.0
10 4 . 2.0
11 1 the 3.0
12 2 third 3.0
13 3 sentence 3.0
Upvotes: 4
Reputation: 8219
try this
df['Sentence']=(df['ID'].diff()<0).cumsum()
df
produces
ID FORM Sentence
-- ---- -------- ----------
0 1 A 0
1 2 word 0
2 3 in 0
3 4 the 0
4 5 first 0
5 6 sentence 0
6 7 . 0
7 1 The 1
8 2 second 1
9 3 sentence 1
10 4 . 1
11 1 the 2
12 2 third 2
13 3 sentence 2
Here (df['ID'].diff()<0)
is a Boolean array that is True when the ID
decreases. .cumsum()
increments by 1 every time this happens
Upvotes: 4