Pedro Alves
Pedro Alves

Reputation: 1054

Python - Pandas - Remove only splits that only numeric but maintain if it have alphabetic

I have a dataframe that have two values:

df = pd.DataFrame({'Col1': ['Table_A112', 'Table_A_112']})

What I am trying to do is to remove the numeric digits in case of the split('_') only have numeric digits. The desired output is:

Table_A112
Table_A_

For that I am using the following code:

import pandas as pd
import difflib
from tabulate import tabulate
import string

df = pd.DataFrame({'Col1': ['Table_A112', 'Table_A_112']})
print(tabulate(df, headers='keys', tablefmt='psql'))
df['Col2'] = df['Col1'].str.rstrip(string.digits)
print(tabulate(df, headers='keys', tablefmt='psql'))

But it gives me the following output:

Table_A
Table_A_

How can do what I want?

Thanks!

Upvotes: 6

Views: 77

Answers (4)

Andy L.
Andy L.

Reputation: 25259

I think using str.replace with capture group making the pattern much simpler

sample df

Out[1063]:
          Col1
0   Table_A112
1  Table_A_112
2  Table_111_B

df.Col1.str.replace(r'(_)\d+', r'\1')

Out[1064]:
0    Table_A112
1      Table_A_
2      Table__B
Name: Col1, dtype: object

Upvotes: 0

wp78de
wp78de

Reputation: 18980

If you insist on a regex solution you can do using pandas.replace() and a positive lookbehind r'(?<=_)\d+'

import pandas as pd
from tabulate import tabulate

df = pd.DataFrame({'Col1': ['Table_A112', 'Table_A_112']})
print(tabulate(df, headers='keys', tablefmt='psql'))
df= df.replace(regex=r'(?<=_)\d+', value='')
print(tabulate(df, headers='keys', tablefmt='psql'))

Which produces the desired output.

Upvotes: 1

yatu
yatu

Reputation: 88276

Here's one way using str.replace:

df = pd.DataFrame({'Col1': ['Table_A112', 'Table_A_112', 'Table_112_avs']})

print(df)

        Col1
0     Table_A112
1    Table_A_112
2  Table_112_avs

df.Col1.str.replace(r'(?:^|_)(\d+)(?:$|_)', '_', regex=True)

0    Table_A112
1      Table_A_
2     Table_avs
Name: Col1, dtype: object

See demo

Upvotes: 4

Quang Hoang
Quang Hoang

Reputation: 150785

You can do something like:

s = df['Col1'].str.split('_',expand=True).stack()
s.mask(s.str.isdigit(), '').groupby(level=0).agg('_'.join)

Output:

0    Table_A112
1      Table_A_
dtype: object

Upvotes: 5

Related Questions