Jack Arnestad
Jack Arnestad

Reputation: 1865

Extract substring from all rows in pandas data frame

I have a pd.DataFrame like the following:

pd.DataFrame(["SSDILFJKSIDHFKJSHDKUFH", "SLIDFSOIUDHFIUSDHF", "K<NFSKJGHSDUFSDK"], ["SKDJF", "FDKSJFSSDF", "SIDFDS"])

I want to extract subsequences from the first column, but the length of the subsequence I want depends on the length of the sequence in the second column. I want to extract the characters from the 2nd character in col1 to the nth character in col1, where n is defined as the number of characters in the corresponding string in col2.

How can this be done?

Upvotes: 2

Views: 957

Answers (2)

jpp
jpp

Reputation: 164663

This is one way using a list comprehension:

df = pd.DataFrame({'A': ["SSDILFJKSIDHFKJSHDKUFH", "SLIDFSOIUDHFIUSDHF",
                         "K<NFSKJGHSDUFSDK"]},
                  index=["SKDJF", "FDKSJFSSDF", "SIDFDS"])

df['B'] = [j[1:i+1] for i, j in zip(s.index.map(len), s.values)]

print(df)

                                 A           B
SKDJF       SSDILFJKSIDHFKJSHDKUFH       SDILF
FDKSJFSSDF      SLIDFSOIUDHFIUSDHF  LIDFSOIUDH
SIDFDS            K<NFSKJGHSDUFSDK      <NFSKJ

Upvotes: 2

BENY
BENY

Reputation: 323226

You can try with apply

df
Out[115]: 
        index                       0
0       SKDJF  SSDILFJKSIDHFKJSHDKUFH
1  FDKSJFSSDF      SLIDFSOIUDHFIUSDHF
2      SIDFDS        K<NFSKJGHSDUFSDK
df.apply(lambda x : x[0][len(x['index'])],axis=1)
Out[116]: 
0    F
1    H
2    J
dtype: object

Or just using python

[y[len(x)]for x,y in zip(df['index'],df[0])]
Out[117]: ['F', 'H', 'J']

Upvotes: 1

Related Questions