ckp
ckp

Reputation: 593

How can I find start and end occurrence of character in Python

I have a dataframe df with the following ids (in Col). The last occurrence of A/B/C represents the start, and the last occurrence of X is the end. I should ignore any other A,B,C between start and end (e.g. rows 8 and 9).

I have to find start and end records from this data and assign a number to each of these occurrences. The column count is my desired output:

    Col         ID
    P
    Q
    A             
    A              
    A             1
    Q             1 
    Q             1 
    B             1 
    C             1
    S             1 
    S             1
    X             1
    X             1  
    X             1 
    Q
    Q
    R
    R
    C              
    C             2 
    D             2 
    E             2 
    B             2 
    K             2 
    D             2 
    E             2 
    E             2 
    X             2 
    X             2 

This code:

lc1 = df.index[df.Col.eq('A') & df.Col.ne(df.Col.shift(-1))] 

would give me an array of all the last occurrences of Index values of 'A', in this case [5].

lc1 = df.index[df.Col.eq('C') & df.Col.ne(df.Col.shift(-1))]  # [20]
lc2 = df.index[df.Col.eq('X') & df.Col.ne(df.Col.shift(-1))]  # [14,29]

I would use iloc to print the count values:

df.iloc[5:14]['count'] = 1
df.iloc[20:29]['count'] = 2

How can I find the indices of A/B/C together and print the count values of each start and end occurrence?

Upvotes: 4

Views: 594

Answers (1)

Khris
Khris

Reputation: 3212

To find your indices of A, B, and C you can do:

df[(df.Col =='A')|(df.Col =='B')|(df.Col =='C')].index

Print your start counts:

df1 = df[df['count'] != df['count'].shift(+1)]
print df1[df1['count'] != 0]['count']

Print your end counts:

df2 = df[df['count'] != df['count'].shift(-1)]
print df2[df2['count'] != 0]['count']

On a sidenote, calling a column count is a bad idea because count is a method of the DataFrame and then you get ambiguity when doing df.count.

EDIT: Corrected since I was answering to a wrong question.

Upvotes: 2

Related Questions