Reputation: 616
I have a pandas dataframe, within the dataframe I have two series/columns that I wish to combine into a new series/column. I already have a for loop that does what I need but I'd rather it be in a list comprehension but I cannot figure it out. Also my code takes a considerable amount of time to execute. I read that list comprehensions run quicker, maybe there is a quicker way?
If the values from 'lead_owner' match the distinct/unique values from 'agent_final' use that value. Otherwise use the values from 'agent_final'
for x, y in zip(list(df['lead_owner']), list(df['agent_final'])):
if x in set(df['agent_final']):
my_list .append(x)
else:
my_list .append(y)
Upvotes: 1
Views: 95
Reputation: 2087
I would suggest your try pandas apply
and share performance :
agents = set(df['agent_final'])
df['result'] = df.apply(lambda x: x['lead_owner'] if x['lead_owner'] in agents else x['agent_final'], axis=1)
and do a to_list
if required
Upvotes: 1
Reputation: 5405
The way to do this using list comprehension:
my_list = [x if x in set(df['agent_final']) else y for (x,y) in zip(list(df['lead_owner']), list(df['agent_final']))]
It's pretty hard to say why your code is running slow, unless I know what the size of your data is.
One way to speed up your code for sure is to not construct the set every time you check if x is in the set. Construct the set outside of the for loop/ list comprehension:
agent_final_set = set(df['agent_final'])
my_list = [x if x in agent_final_set else y for (x,y) in zip(list(df['lead_owner']), list(df['agent_final']))]
Upvotes: 2
Reputation: 92854
With numpy.where
one-liner:
my_list = np.where(df.lead_owner.isin(df.agent_final), df.lead_owner, df.agent_final)
Simple example:
In [284]: df
Out[284]:
lead_owner agent_final
0 a 1
1 b 2
2 c a
3 e c
In [285]: np.where(df.lead_owner.isin(df.agent_final), df.lead_owner, df.agent_final)
Out[285]: array(['a', '2', 'c', 'c'], dtype=object)
Upvotes: 0
Reputation: 235984
I removed some unnecessary code and extracted the creation of the set outside of the main loop. Let's see if this runs faster:
agents = set(df['agent_final'])
data = zip(df['lead_owner'], df['agent_final'])
result = [x if x in agents else y for x, y in data]
Upvotes: 1