sam123
sam123

Reputation: 43

Pandas dataframe: slicing column values using second column for slice index

I'm trying to create a column of microsatellite motifs in a pandas dataframe. I have one column that gives the length of the motif and another that has the whole microsatellite.

Here's an example of the columns of interest.

     motif_len    sequence
0    3            ATTATTATTATT
1    4            ATCTATCTATCT
2    3            ATCATCATCATC

I would like to slice the values in sequence using the values in motif_len to give a single repeat(motif) of each microsatellite. I'd then like to add all these motifs as a third column in the data frame to give something like this.

     motif_len    sequence        motif
0    3            ATTATTATTATT    ATT
1    4            ATCTATCTATCT    ATCT
2    3            ATCATCATCATC    ATC

I've tried a few things with no luck.

>>df['motif'] = df.sequence.str[:df.motif_len]
>>df['motif'] = df.sequence.str[:df.motif_len.values]

Both make the motif column but all the values are NaN.

I think I understand why these don't work. I'm passing a series/array as the upper index in the slice rather than the a value from the mot_len column.

I also tried to create a series by iterating through each Any ideas?

Upvotes: 4

Views: 2568

Answers (1)

EdChum
EdChum

Reputation: 393893

You can call apply on the df pass axis=1 to apply row-wise and use the column values to slice the str:

In [5]:
df['motif'] = df.apply(lambda x: x['sequence'][:x['motif_len']], axis=1)
df

Out[5]:
   motif_len      sequence motif
0          3  ATTATTATTATT   ATT
1          4  ATCTATCTATCT  ATCT
2          3  ATCATCATCATC   ATC

Upvotes: 4

Related Questions