Reputation: 1865
I have a pd.DataFrame like the following:
pd.DataFrame(["SSDILFJKSIDHFKJSHDKUFH", "SLIDFSOIUDHFIUSDHF", "K<NFSKJGHSDUFSDK"], ["SKDJF", "FDKSJFSSDF", "SIDFDS"])
I want to extract subsequences from the first column, but the length of the subsequence I want depends on the length of the sequence in the second column. I want to extract the characters from the 2nd character in col1 to the nth character in col1, where n is defined as the number of characters in the corresponding string in col2.
How can this be done?
Upvotes: 2
Views: 957
Reputation: 164663
This is one way using a list comprehension:
df = pd.DataFrame({'A': ["SSDILFJKSIDHFKJSHDKUFH", "SLIDFSOIUDHFIUSDHF",
"K<NFSKJGHSDUFSDK"]},
index=["SKDJF", "FDKSJFSSDF", "SIDFDS"])
df['B'] = [j[1:i+1] for i, j in zip(s.index.map(len), s.values)]
print(df)
A B
SKDJF SSDILFJKSIDHFKJSHDKUFH SDILF
FDKSJFSSDF SLIDFSOIUDHFIUSDHF LIDFSOIUDH
SIDFDS K<NFSKJGHSDUFSDK <NFSKJ
Upvotes: 2
Reputation: 323226
You can try with apply
df
Out[115]:
index 0
0 SKDJF SSDILFJKSIDHFKJSHDKUFH
1 FDKSJFSSDF SLIDFSOIUDHFIUSDHF
2 SIDFDS K<NFSKJGHSDUFSDK
df.apply(lambda x : x[0][len(x['index'])],axis=1)
Out[116]:
0 F
1 H
2 J
dtype: object
Or just using python
[y[len(x)]for x,y in zip(df['index'],df[0])]
Out[117]: ['F', 'H', 'J']
Upvotes: 1