Are there way to get consecutive rows according to value of specific column? For example: column1 column2 View row1 1 2 c row2 3 4 a row3 5 6 p row4 7 8 p row5 9 10 n I need to get the rows that have the letter of word 'app' as View, so in this example I need to save row2, row3 and row4 in a list.

Reputation: 37

Get the rows of dataframe based on the consecutive values of one column

Are there way to get consecutive rows according to value of specific column? For example:

	column1	column2	View
row1	1	2	c
row2	3	4	a
row3	5	6	p
row4	7	8	p
row5	9	10	n

I need to get the rows that have the letter of word 'app' as View, so in this example I need to save row2, row3 and row4 in a list.

Upvotes: 1

Answers (3)

Joe Carboni

Reputation: 81

Here is a generalizable approach. I use index_slice_by_substring() to generate a tuple of integers representing the beginning and ending row. The function rows_by_consecutive_letters() takes your dataframe, the column name to check, and the string you want to look for, and for the return value it utilizes .iloc to grab a slice of the table by integer values.

The key to getting the slice indices is joining the "View" column values together into a single string using ''.join(df[column]) and checking substrings of the same length as the condition string from left to right until there's a match

def index_slice_by_substring(full_string, substring) -> tuple:
    len_substring = len(substring)
    len_full_string = len(full_string)
    for x0, x1 in enumerate(range(len_substring,len_full_string)):
        if full_string[x0:x1] == substring:
            return (x0,x1)

def rows_by_consecutive_letters(df, column, condition) -> pd.DataFrame:
    row_begin, row_end = index_slice_by_substring(''.join(df[column]), condition)
    return df.iloc[row_begin:row_end,:]

print(rows_by_consecutive_letters(your_df,"View","app"))

Returns:

   column1  column2 View
1        3        4    a
2        5        6    p
3        7        8    p

Upvotes: 2

Niv Dudovitch

Reputation: 1658

Not the pythonic way, but doing the work:

keep = []
for i in range(len(df) - 2):
    if (df.View[i]=='a') & (df.View[i+1] =='p') & (df.View[i+2] =='p'):
        keep.append(df[i])
        keep.append(df[i+1])
        keep.append(df[i+2])

Result:

Upvotes: 1

Michael Szczesny

Reputation: 5036

You can use str.find but this only finds the first occurrence of your search term.

search = 'app'
i = ''.join(df.View).find(search)
if i>-1:
    print(df.iloc[i: i+len(search)])

Output

      column1  column2 View                         
row2        3        4    a
row3        5        6    p
row4        7        8    p

To find none (without error checking), one and all occurrences you can use re.finditer. The result is a list of dataframe slices.

import re
search='p'   # searched for 'p' to find more than one
[df.iloc[x.start():x.end()] for x in re.finditer(search, ''.join(df.View))]

Output

[      column1  column2 View                        
 row3        5        6    p,
       column1  column2 View                         
 row4        7        8    p]

Upvotes: 1

Get the rows of dataframe based on the consecutive values of one column

Answers (3)

Related Questions