Sz P
Sz P

Reputation: 13

New column based on rows from specific column in pandas

I have this kind of data like on image. I need to get sequences which Type are "secstr", fill it to new column next to column Sequence which have same PDB_ID number and Chain. At the end I want to delete rows with "secstr" sequences.

So far I have something like that:

["Secstr"] = sequences.Sequence[
    (sequences['PDB_ID'] == sequences['PDB_ID']) & 
    (sequences['Chain'] == sequences['Chain']) & 
    (sequences['Type'] == 'secstr')]

Image with table

The data I need should look like this:

    PDB_ID  Chain          Sequence                  Secstr
0   101M     A       MVLSEGEWQLVLHVWAKVEA       HHHH  HHHHGGHH HHHH
1   102L     A       MVLSEGEWQLVLHVWAKVEA    HHHH  HHHHHHHGGHH   HH
2   102M     A       MVLSEGEWQLVLHVWAKVEA    HHHHHHHHHGGHH HHH     
3   103L     A       MVLSEGEWQLVLHVWAKVEA       HHHHH HHHHHH HHGGH 
4   103L     B       MVLSEGEWQLVLHVWAKVEA       HHHHH HHHHHH HHHHH 

Upvotes: 0

Views: 57

Answers (2)

r-beginners
r-beginners

Reputation: 35115

Combine the original DF and the DF extracted by 'secstr' to remove unnecessary columns. Does this meet the intent of the question?

# Splitting the DF by 'Type'
df2 = df[df['Type'] == 'secstr']
df2.set_index(['PDB_ID','Chain'], inplace=True)
# Extract and divide 'Type' except 'secstr' ('sequence' extraction)
df = df[~(df['Type'] == 'secstr')]
df.set_index(['PDB_ID','Chain'], inplace=True)

# Combining DF and DF2 (in the column direction)
new_df = pd.concat([df,df2], axis=1)
new_df.reset_index(inplace=True)

# Renaming a column
new_cols = ['PDB_ID', 'Chain', 'Type', 'Sequence', 'Type1', 'Secstr']
new_df.columns = new_cols

# Deleting unnecessary columns
new_df.drop(columns=new_df.columns[[2,4]], inplace=True)

    new_df
    PDB_ID  Chain   Sequence    Secstr
0   101M    A   HJGDSDDLKEIEWUSKDSK OLKDSJDJFYEUKIBK
1   102L    A   HJGDSDDLKEIEWUSKDSK OLKDSJDJFYEUKIBK
2   102M    A   HJGDSDDLKEIEWUSKDSK OLKDSJDJFYEUKIBK
3   103L    A   HJGDSDDLKEIEWUSKDSK OLKDSJDJFYEUKIBK
4   103M    A   HJGDSDDLKEIEWUSKDSK OLKDSJDJFYEUKIBK

Upvotes: 1

Sz P
Sz P

Reputation: 13

    PDB_ID  Chain          Sequence                  Secstr
0   101M     A       MVLSEGEWQLVLHVWAKVEA       HHHH  HHHHGGHH HHHH
1   102L     A       MVLSEGEWQLVLHVWAKVEA    HHHH  HHHHHHHGGHH   HH
2   102M     A       MVLSEGEWQLVLHVWAKVEA    HHHHHHHHHGGHH HHH     
3   103L     A       MVLSEGEWQLVLHVWAKVEA       HHHHH HHHHHH HHGGH 
4   103L     B       MVLSEGEWQLVLHVWAKVEA       HHHHH HHHHHH HHHHH 

I need data something like that

Upvotes: 0

Related Questions