SFC
SFC

Reputation: 793

keep rows that start with certain text strings

Background

I have the following df

import pandas as pd
df = pd.DataFrame({'Text' : ['\n[SPORTS FAN]\nHere', 
                                   'Nothing here', 
                                 '\n[BASEBALL]\nTHIS SOUNDS right',
                                 '\n[SPORTS FAN]\nLikes sports', 
                                   'Nothing is here', 
                                 '\n[NOT SPORTS]\nTHIS SOUNDS good',
                                 '\n[SPORTS FAN]\nReally Big big fan',
                                  '\n[BASEBALL]\nRARELY IS a fan'
                                ], 

                          'P_ID': [1,2,3,4,5,6,7,8], 
                          'P_Name' : ['J J SMITH', 
                                      'J J SMITH',
                                      'J J SMITH',
                                      'J J SMITH',
                                      'MARY HYDER', 
                                      'MARY HYDER', 
                                      'MARY HYDER', 
                                      'MARY HYDER']
                         })

Output

P_ID    P_Name      Text
0   1   J J SMITH   \n[SPORTS FAN]\nHere
1   2   J J SMITH   Nothing here
2   3   J J SMITH   \n[BASEBALL]\nTHIS SOUNDS right
3   4   J J SMITH   \n[SPORTS FAN]\nLikes sports
4   5   MARY HYDER  Nothing is here
5   6   MARY HYDER  \n[NOT SPORTS]\nTHIS SOUNDS good
6   7   MARY HYDER  \n[SPORTS FAN]\nReally Big big fan
7   8   MARY HYDER  \n[BASEBALL]\nRARELY IS a fan

Goal

Keep rows that start with '\n[SPORTS FAN]\ and \n[BASEBALL]\n

Desired Output

P_ID    P_Name      Text
0   1   J J SMITH   \n[SPORTS FAN]\nHere
2   3   J J SMITH   \n[BASEBALL]\nTHIS SOUNDS right
3   4   J J SMITH   \n[SPORTS FAN]\nLikes sports
6   7   MARY HYDER  \n[SPORTS FAN]\nReally Big big fan
7   8   MARY HYDER  \n[BASEBALL]\nRARELY IS a fan

Question

How do I achieve my desired output?

Upvotes: 1

Views: 1061

Answers (1)

Brendan McDonald
Brendan McDonald

Reputation: 413

Try this:

df_new = df.loc[df['Text'].str.startswith('\n[SPORTS FAN]') | df['Text'].str.startswith('\n[BASEBALL]')]

No regex required

Upvotes: 2

Related Questions