Reputation: 63
I have a dataframe and one of the columns roughly looks like as shown below. Is there any way to rename rows? Rows should be renamed as psPARP8, psEXOC8, psTMEM128, psCFHR3. Where ps represents pseudogene and and the term in bracket is the code for that pseudogene. I will highly appreciate if anyone can can make a python function or any alternative to perform this task.
d = {'gene_final': ["1poly(ADP-ribose) polymerase family member 8 (PARP8) pseudogene",
"exocyst complex component 8 (EXOC8) pseudogene",
"transmembrane protein 128 (TMEM128) pseudogene",
"complement factor H related 3 (CFHR3) pseudogene",
"mitochondrially encoded NADH 4L dehydrogenase (MT-ND4L) pseudogene",
"relaxin family peptide/INSL5 receptor 4 (RXFP4 ) pseudogene",
"nasGBP7and GBP2"
]}
df = pd.DataFrame(data=d)
The desired output should look like this
gene_final
-----------
psPARP8
psEXOC8
psTMEM128
psCFHR3
psMT-ND4L
psRXFP4
nasGBP2
Upvotes: 1
Views: 4066
Reputation: 1064
import pandas as pd
from regex import regex
# build dataframe
df = pd.DataFrame({'gene_final': ["poly(ADP-ribose) polymerase family member 8 (PARP8) pseudogene",
"exocyst complex component 8 (EXOC8) pseudogene",
"transmembrane protein 128 (TMEM128) pseudogene",
"complement factor H related 3 (CFHR3) pseudogene"]})
def extract_name(s):
"""Helper function to extract ps name """
s = regex.findall(r"\s\((\S*)\s?\)", s)[0] # find a word between ' (' and ' )'
s = f"ps{s}" # add ps to string
return s
# apply function extract_name() to each row
df['gene_final'] = df['gene_final'].apply(extract_name)
print(df)
> gene_final
> 0 psPARP8
> 1 psEXOC8
> 2 psTMEM128
> 3 psCFHR3
> 4 psMT-ND4L
> 5 psRXFP4
Upvotes: 1
Reputation: 406
I think you are saying about index names (rows): This is how you change the row names in DataFrames:
import pandas as pd
df = pd.DataFrame({'A': [11, 21, 31],
'B': [12, 22, 32],
'C': [13, 23, 33]},
index=['ONE', 'TWO', 'THREE'])
print(df)
and you can change the row names after building dataframe also like this:
df_new = df.rename(columns={'A': 'Col_1'}, index={'ONE': 'Row_1'})
print(df_new)
# Col_1 B C
# Row_1 11 12 13
# TWO 21 22 23
# THREE 31 32 33
print(df)
# A B C
# ONE 11 12 13
# TWO 21 22 23
# THREE 31 32 33
Upvotes: 1