Reputation: 37
Are there way to get consecutive rows according to value of specific column? For example:
column1 | column2 | View | |
---|---|---|---|
row1 | 1 | 2 | c |
row2 | 3 | 4 | a |
row3 | 5 | 6 | p |
row4 | 7 | 8 | p |
row5 | 9 | 10 | n |
I need to get the rows that have the letter of word 'app' as View, so in this example I need to save row2, row3 and row4 in a list.
Upvotes: 1
Views: 1368
Reputation: 81
Here is a generalizable approach. I use index_slice_by_substring()
to generate a tuple of integers representing the beginning and ending row. The function rows_by_consecutive_letters()
takes your dataframe, the column name to check, and the string you want to look for, and for the return value it utilizes .iloc
to grab a slice of the table by integer values.
The key to getting the slice indices is joining the "View" column values together into a single string using ''.join(df[column])
and checking substrings of the same length as the condition string from left to right until there's a match
def index_slice_by_substring(full_string, substring) -> tuple:
len_substring = len(substring)
len_full_string = len(full_string)
for x0, x1 in enumerate(range(len_substring,len_full_string)):
if full_string[x0:x1] == substring:
return (x0,x1)
def rows_by_consecutive_letters(df, column, condition) -> pd.DataFrame:
row_begin, row_end = index_slice_by_substring(''.join(df[column]), condition)
return df.iloc[row_begin:row_end,:]
print(rows_by_consecutive_letters(your_df,"View","app"))
Returns:
column1 column2 View
1 3 4 a
2 5 6 p
3 7 8 p
Upvotes: 2
Reputation: 1658
Not the pythonic way, but doing the work:
keep = []
for i in range(len(df) - 2):
if (df.View[i]=='a') & (df.View[i+1] =='p') & (df.View[i+2] =='p'):
keep.append(df[i])
keep.append(df[i+1])
keep.append(df[i+2])
Result:
Upvotes: 1
Reputation: 5036
You can use str.find
but this only finds the first occurrence of your search term.
search = 'app'
i = ''.join(df.View).find(search)
if i>-1:
print(df.iloc[i: i+len(search)])
Output
column1 column2 View
row2 3 4 a
row3 5 6 p
row4 7 8 p
To find none (without error checking), one and all occurrences you can use re.finditer
. The result is a list of dataframe slices.
import re
search='p' # searched for 'p' to find more than one
[df.iloc[x.start():x.end()] for x in re.finditer(search, ''.join(df.View))]
Output
[ column1 column2 View
row3 5 6 p,
column1 column2 View
row4 7 8 p]
Upvotes: 1