Alok Chauhan
Alok Chauhan

Reputation: 63

Changing row names in dataframe

I have a dataframe and one of the columns roughly looks like as shown below. Is there any way to rename rows? Rows should be renamed as psPARP8, psEXOC8, psTMEM128, psCFHR3. Where ps represents pseudogene and and the term in bracket is the code for that pseudogene. I will highly appreciate if anyone can can make a python function or any alternative to perform this task.

d = {'gene_final': ["1poly(ADP-ribose) polymerase family member 8 (PARP8) pseudogene", 
                "exocyst complex component 8 (EXOC8) pseudogene",
               "transmembrane protein 128 (TMEM128) pseudogene",
               "complement factor H related 3 (CFHR3) pseudogene",
                "mitochondrially encoded NADH 4L dehydrogenase (MT-ND4L) pseudogene",
                "relaxin family peptide/INSL5 receptor 4 (RXFP4 ) pseudogene",
                "nasGBP7and GBP2"
                
               ]}

df = pd.DataFrame(data=d)

The desired output should look like this

gene_final
-----------
psPARP8
psEXOC8
psTMEM128
psCFHR3
psMT-ND4L
psRXFP4
nasGBP2

Upvotes: 1

Views: 4066

Answers (2)

psychOle
psychOle

Reputation: 1064

import pandas as pd
from regex import regex

# build dataframe
df = pd.DataFrame({'gene_final': ["poly(ADP-ribose) polymerase family member 8 (PARP8) pseudogene",
                                  "exocyst complex component 8 (EXOC8) pseudogene",
                                  "transmembrane protein 128 (TMEM128) pseudogene",
                                  "complement factor H related 3 (CFHR3) pseudogene"]})


def extract_name(s):
    """Helper function to extract ps name """
    s = regex.findall(r"\s\((\S*)\s?\)", s)[0] # find a word between ' (' and ' )'
    s = f"ps{s}" # add ps to string
    return s

# apply function extract_name() to each row
df['gene_final'] = df['gene_final'].apply(extract_name)
print(df)
>   gene_final
> 0    psPARP8
> 1    psEXOC8
> 2  psTMEM128
> 3    psCFHR3
> 4  psMT-ND4L
> 5    psRXFP4

Upvotes: 1

Sarim Sikander
Sarim Sikander

Reputation: 406

I think you are saying about index names (rows): This is how you change the row names in DataFrames:

import pandas as pd

df = pd.DataFrame({'A': [11, 21, 31],
                   'B': [12, 22, 32],
                   'C': [13, 23, 33]},
                  index=['ONE', 'TWO', 'THREE'])

print(df)

and you can change the row names after building dataframe also like this:

df_new = df.rename(columns={'A': 'Col_1'}, index={'ONE': 'Row_1'})
print(df_new)
#        Col_1   B   C
# Row_1     11  12  13
# TWO       21  22  23
# THREE     31  32  33

print(df)
#         A   B   C
# ONE    11  12  13
# TWO    21  22  23
# THREE  31  32  33

Upvotes: 1

Related Questions