Reputation: 1460
I have two data frames in which one of the column contains lists. I have to difference the columns like shown below.
DF1:
A B
111 [12,13,14,14,15,13]
222 [15,16,17,15,17,17,17]
333 [17,14,16,14,14,17,17,16]
444 [25,26,18,12,12,12,13,18]
DF2:
A B
111 [12,14]
222 []
333 [17,16]
444 [25,18]
Expected Output:
A B
111 [13,15,13]
222 [15,16,17,15,17,17,17]
333 [14,14,14]
444 [26,12,12,12,13]
Upvotes: 2
Views: 52
Reputation: 1460
This worked for me in a less time compared to all the above
df3 = df1.merge(df2, on='A')
def set_diff(movie, all_):
if movie is not None:
return [item for item in movie if item not in all_]
else:
all_
movie_list = []
for item, row in df3.iterrows():
movie = row['df2.B']
all_ = row['df1.B']
movie_list.append(set_diff(movie, all_))
Upvotes: 0
Reputation: 294318
Just a demonstration of the use of pipe
after a merge
def f(t):
return [i for i in t[0] if not i in t[1]]
df1.merge(df2, on='A').pipe(
lambda d: d[['A']].assign(B=list(map(f, d.drop('A', 1).values)))
)
A B
0 111 [13, 15, 13]
1 222 [15, 16, 17, 15, 17, 17, 17]
2 333 [14, 14, 14]
3 444 [26, 12, 12, 12, 13]
# Heavy lifting for differencing
def f(t):
return [i for i in t[0] if not i in t[1]]
# Merge the same as AmiTavory
# But then I use pipe and assign. Dbl Brackets to keep single column
# dataframe and assign to create a new B column
# then I use the values from the merge after dropping the A column
df1.merge(df2, on='A').pipe(
lambda d: d[['A']].assign(B=list(map(f, d.drop('A', 1).values)))
)
Upvotes: 0
Reputation: 1343
You should keep in mind that pandas do not store the list as "actual list" but as an object. You should always try to make the columns with atomic values instead of collections to leverage the full use of pandas capabilities. That being said, to do your desired transformation you can just convert df2's columns to set and remove all those items in corresponding columns of df1.
You need to be sure to convert the "object lists" to "actual list/set" before doing the operations.
Here is the code for this:
Method to apply on df1
def fun(x):
# Find the list corresponding to the column A of df1 in df2
# Use indexing to make this step faster
remove_set = set(df2[df2['A']==x['A']].iloc[0]['B'])
actual_list = list(x['B'])
new_list = []
for i in actual_list:
if i not in remove_set:
new_list.append(i)
return new_list
calling the method as
df1['B'] = df1.apply(fun, axis=1)
produces the output as
A B
0 111 [13, 15, 13]
1 222 [15, 16, 17, 15, 17, 17, 17]
2 333 [14, 14, 14]
3 444 [26, 12, 12, 12, 13]
Note: if you can use indexing on column A then the performance of this code will greatly improve.
Upvotes: 0
Reputation: 76297
You can leverage Pandas's merge abilities, and Python's efficient set
data structure.
First, merge:
df3 = df1.merge(df2, on='A')
Following that, make df2's items into a set:
df3.B_y = df3.B_y.apply(set)
Now use list comprehension to iterate over df1's items that are not in the set, for each row:
df3['res'] = df3.apply(lambda r: [e for e in r.B_x if e not in r.B_Y], axis=1)
Upvotes: 2
Reputation: 71
You can try this,
df1["B"]=[list(i for i in df1["B"][j] if i not in df2["B"][j]) for j in range(df1.shape[0])]
Upvotes: 1