rogerwhite
rogerwhite

Reputation: 345

Python Pandas Split DF

pls review the code below, is there a more efficient way of splitting one DF into two? In the code below, the query is run twice. Would it be faster to just run the query once, and basically say if true send to DF1, else to DF2 ; or maybe after DF1 is created, someway to say that DF2 = DF minus DF1

code:

x1='john'
df = pd.read_csv(file, sep='\n', header=None, engine='python', quoting=3)
df = df[0].str.strip(' \t"').str.split('[,|;: \t]+', 1, expand=True).rename(columns={0: 'email', 1: 'data'}) 
df1= df[df.email.str.startswith(x1)]
df2= df[~df.email.str.startswith(x1)]

Upvotes: 1

Views: 77

Answers (1)

timgeb
timgeb

Reputation: 78650

There's no need to compute the mask df.emailclean.str.startswith(x1) twice.

mask = df.emailclean.str.startswith(x1)
df1 = df[mask].copy() # in order not have SettingWithCopyWarning 
df2 = df[~mask].copy() # https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas

Upvotes: 2

Related Questions