Reputation: 666
How can I create a new list column from a list column
My dataframe:
id x list_id
1 20 [2, 4]
2 10 [1, 3]
3 10 [1]
4 30 [1, 2]
What I want:
id x list_id list_x
1 20 [2, 4] [10, 30]
2 10 [1, 3] [20, 10]
3 10 [1] [20]
4 30 [1, 2] [20, 10]
My first idea is to iterate on each line then check if the id is in the list
for index, row in df.iterrows():
if ( df['id'].isin(row['list_id']) ):
do_somthing
But its not working, any suggestion !!
Upvotes: 3
Views: 4084
Reputation: 294308
Creative Solution
Using numpy
object arrays with set
elements
i = np.array([set([x]) for x in df.id.values.tolist()])
x = np.empty(i.shape, dtype=object)
x[:] = [[x] for x in df.x.values.tolist()]
y = np.empty_like(x)
y.fill([])
j = np.array([set(x) for x in df.list_id.values.tolist()])
df.assign(list_x=np.where(i <= j[:, None], x, y).sum(1))
id x list_id list_x
0 1 20 [2, 4] [10, 30]
1 2 10 [1, 3] [20, 10]
2 3 10 [1] [20]
3 4 30 [1, 2] [20, 10]
Timing
%timeit df.assign(list_x=[df.x[df['id'].isin(l)].values for l in df.list_id])
1000 loops, best of 3: 1.21 ms per loop
%%timeit
i = np.array([set([x]) for x in df.id.values.tolist()])
x = np.empty(i.shape, dtype=object)
x[:] = [[x] for x in df.x.values.tolist()]
y = np.empty_like(x)
y.fill([])
j = np.array([set(x) for x in df.list_id.values.tolist()])
df.assign(list_x=np.where(i <= j[:, None], x, y).sum(1))
1000 loops, best of 3: 371 µs per loop
Upvotes: 0
Reputation: 1599
Use a list comprehension:
df.loc[:,'list_x'] = [df.x[df['id'].isin(l)].values for l in df.list_id]
Full example with dummy data:
import pandas as pd
data= {
'id': [1,2,3,4],
'x': [20,10,10,30],
'list_id': [[2,4],[1,3],[1],[1,2]],
}
df = pd.DataFrame(data)
df.loc[:,'list_x'] = [df.x[df['id'].isin(l)].values for l in df.list_id]
Output
print df
list_id x list_x
1 [2, 4] 20 [10, 30]
2 [1, 3] 10 [20, 10]
3 [1] 10 [20]
4 [1, 2] 30 [20, 10]
Upvotes: 4