Python Pandas Split DF

Question

pls review the code below, is there a more efficient way of splitting one DF into two? In the code below, the query is run twice. Would it be faster to just run the query once, and basically say if true send to DF1, else to DF2 ; or maybe after DF1 is created, someway to say that DF2 = DF minus DF1

code:

x1='john'
df = pd.read_csv(file, sep='
', header=None, engine='python', quoting=3)
df = df[0].str.strip(' 	"').str.split('[,|;: 	]+', 1, expand=True).rename(columns={0: 'email', 1: 'data'}) 
df1= df[df.email.str.startswith(x1)]
df2= df[~df.email.str.startswith(x1)]

timgeb · Accepted Answer

There's no need to compute the mask df.emailclean.str.startswith(x1) twice.

mask = df.emailclean.str.startswith(x1)
df1 = df[mask].copy() # in order not have SettingWithCopyWarning 
df2 = df[~mask].copy() # https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas

Python Pandas Split DF

Answers (1)

Related Questions