asus chen
asus chen

Reputation: 51

How to slice strings in a column by another column in pandas

df=pd.DataFrame({'A':['abcde','fghij','klmno','pqrst'], 'B':[1,2,3,4]})

I want to slice column A by column B eg: abcde[:1]=a, klmno[:3]=klm but two statements all failed:

df['new_column']=df.A.map(lambda x: x.str[:df.B])
df['new_column']=df.apply(lambda x: x.A[:x.B]) 

TypeError: string indices must be integers

and

df['new_column']=df['A'].str[:df['B']]

new_column return NaN

Try to get new_column:

      A    B  new_column
0   abcde  1     a
1   fghij  2     fg
2   klmno  3     klm
3   pqrst  4     pqrs

Thank you so much

Upvotes: 1

Views: 1898

Answers (2)

Rohit-Pandey
Rohit-Pandey

Reputation: 2159

By using zip.May this solution is helpful for you.

enter image description here

Upvotes: 3

akuiper
akuiper

Reputation: 214957

You need axis=1 in the apply method to loop through rows:

df['new_column'] = df.apply(lambda r: r.A[:r.B], axis=1)
df
#       A   B   new_column
#0  abcde   1   a
#1  fghij   2   fg
#2  klmno   3   klm
#3  pqrst   4   pqrs

A less idiomatic but usually faster solution is to use zip:

df['new_column'] = [A[:B] for A, B in zip(df.A, df.B)]
df

#       A   B   new_column
#0  abcde   1   a
#1  fghij   2   fg
#2  klmno   3   klm
#3  pqrst   4   pqrs

%timeit df.apply(lambda r: r.A[:r.B], axis=1)
# 1000 loops, best of 3: 440 µs per loop

%timeit [A[:B] for A, B in zip(df.A, df.B)]
# 10000 loops, best of 3: 27.6 µs per loop

Upvotes: 12

Related Questions