Reputation: 593
I have a dataframe df
with the following ids (in Col
).
The last occurrence of A/B/C represents the start, and the last occurrence of X is the end. I should ignore any other A,B,C between start and end (e.g. rows 8 and 9).
I have to find start and end records from this data and assign a number to each of these occurrences. The column count
is my desired output:
Col ID
P
Q
A
A
A 1
Q 1
Q 1
B 1
C 1
S 1
S 1
X 1
X 1
X 1
Q
Q
R
R
C
C 2
D 2
E 2
B 2
K 2
D 2
E 2
E 2
X 2
X 2
This code:
lc1 = df.index[df.Col.eq('A') & df.Col.ne(df.Col.shift(-1))]
would give me an array of all the last occurrences of Index values of 'A', in this case [5]
.
lc1 = df.index[df.Col.eq('C') & df.Col.ne(df.Col.shift(-1))] # [20]
lc2 = df.index[df.Col.eq('X') & df.Col.ne(df.Col.shift(-1))] # [14,29]
I would use iloc
to print the count values:
df.iloc[5:14]['count'] = 1
df.iloc[20:29]['count'] = 2
How can I find the indices of A/B/C together and print the count values of each start and end occurrence?
Upvotes: 4
Views: 594
Reputation: 3212
To find your indices of A, B, and C you can do:
df[(df.Col =='A')|(df.Col =='B')|(df.Col =='C')].index
Print your start counts:
df1 = df[df['count'] != df['count'].shift(+1)]
print df1[df1['count'] != 0]['count']
Print your end counts:
df2 = df[df['count'] != df['count'].shift(-1)]
print df2[df2['count'] != 0]['count']
On a sidenote, calling a column count
is a bad idea because count
is a method of the DataFrame and then you get ambiguity when doing df.count
.
EDIT: Corrected since I was answering to a wrong question.
Upvotes: 2