Reputation: 2232
I have a Pandas dataframe(df) with following columns:
df["ids"]
0 18281483,1658391547
1 1268212,128064430
2 1346542425
3 13591493,13123669,35938208
df["id"]
0 18281483
1 1268212
2 1346542425
3 13123669
I like to find out, in which order of "ids" the respective "id" can be found, and output the respective value in a new column "order". Following code was tried without success:
df["order"] = df["ids"].str.split(",").index(df["id"])
----------------------------------------------------------------------
TypeError: 'Int64Index' object is not callable
Is there a syntax error? I tried the split and index function with every row manually (by inserting the lists and string), and it worked.
Desired output:
df["order"]
0 0
1 0
2 0
3 1
Upvotes: 0
Views: 265
Reputation: 51165
Really shouldn't need to use apply
here. On larger Dataframes it will be incredibly slow. Broadcasted comparison will work just fine.
(df["ids"].str.split(",", expand=True) == df["id"][:, None]).idxmax(1)
0 0
1 0
2 0
3 1
dtype: int64
Performance
d = {'ids': {0: '18281483,1658391547',
1: '1268212,128064430',
2: '1346542425',
3: '13591493,13123669,35938208'},
'id': {0: '18281483',
1: '1268212',
2: '1346542425',
3: '13123669'}}
df = pd.DataFrame(d)
df = pd.concat([df] * 1000)
%timeit (df["ids"].str.split(",", expand=True) == df["id"][:, None]).idxmax(1)
7.51 ms ± 61.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.apply(lambda x: x['ids'].split(',').index(x['id']), axis=1)
54.1 ms ± 249 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 0
Reputation: 150735
Try:
df['output'] = df.astype(str).apply(lambda x: x['ids'].split(',').index(x['id']), axis=1)
Output:
ids id output
0 18281483,1658391547 18281483 0
1 1268212,128064430 1268212 0
2 1346542425 1346542425 0
3 13591493,13123669,35938208 13123669 1
Upvotes: 1
Reputation: 8302
Here is a approach,
def index_(ids, id):
split_ = ids.split(",")
if id in split_:
return split_.index(id)
else:
return -1
print(
df.assign(id = df1.id.astype(str))
.apply(lambda x: index_(x.ids, x.id), axis=1)
)
0 0
1 0
2 0
3 1
dtype: int64
Upvotes: 1