Reputation: 345
pls review the code below, is there a more efficient way of splitting one DF into two? In the code below, the query is run twice. Would it be faster to just run the query once, and basically say if true send to DF1, else to DF2 ; or maybe after DF1 is created, someway to say that DF2 = DF minus DF1
code:
x1='john'
df = pd.read_csv(file, sep='\n', header=None, engine='python', quoting=3)
df = df[0].str.strip(' \t"').str.split('[,|;: \t]+', 1, expand=True).rename(columns={0: 'email', 1: 'data'})
df1= df[df.email.str.startswith(x1)]
df2= df[~df.email.str.startswith(x1)]
Upvotes: 1
Views: 77
Reputation: 78650
There's no need to compute the mask df.emailclean.str.startswith(x1)
twice.
mask = df.emailclean.str.startswith(x1)
df1 = df[mask].copy() # in order not have SettingWithCopyWarning
df2 = df[~mask].copy() # https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas
Upvotes: 2