Split pandas dataframe by String

Question

I'm new to using Pandas dataframes. I have data in a .csv like this:

foo, 1234,
bar, 4567
stuff, 7894
New Entry,,
morestuff,1345

I'm reading it into the dataframe with

 df = pd.read_csv

But what I really want is a new dataframe (or a way of splitting the current one) every time I have a "New Entry" line (obviously without including it). How could this be done?

Zero · Accepted Answer

1) Doing it on the fly while reading the file line-by-line and checking for NewEntry break is one approach.

2) Other way, if the dataframe already exists is to find the NewEntry and slice the dataframe into multiple ones to dff = {}

df                                                                 
        col1  col2  
0        foo  1234    
1        bar  4567                
2      stuff  7894                                                        
3   NewEntry   NaN                       
4  morestuff  1345

Find the NewEntry rows, add [-1] and [len(df.index)] for boundary conditions

rows = [-1] + np.where(df['col1']=='NewEntry')[0].tolist() + [len(df.index)]
[-1, 3L, 5]

Create the dict of dataframes

dff = {}                                                                            
for i, r in enumerate(rows[:-1]):                                                   
    dff[i] = df[r+1: rows[i+1]]

Dict of dataframes {0: datafram1, 1: dataframe2}

dff                           
{0:     col1  col2            
 0    foo  1234               
 1    bar  4567               
 2  stuff  7894, 1:         col1  col2  
 4  morestuff  1345}

Dataframe 1

dff[0]              
    col1  col2      
0    foo  1234      
1    bar  4567      
2  stuff  7894

Dataframe 2

dff[1]              
        col1  col2  
4  morestuff  1345

Split pandas dataframe by String

Answers (2)

Related Questions