How to partially remove content from cell in a dataframe using Python

Question

I have the following dataframe:

import pandas as pd    
df = pd.DataFrame([
        ['
SOVAT
', 'DVR', 'MEA', '
195
'],
        ['PINCO
GALLO ', 'DVR', 'MEA
', '195'],
    ])

which looks like this:

My goal is to analyze every single cell of the dataframe so that:

if the substring appears only once, then I delete it along with all the characters that come before it;
if the substring appears more than once in a specific cell, then I remove all the contained along with what comes before and after them (except for what is in between)

The output of the code should be this:

Notice: so far I only know how to remove the what comes before or after the substring by using the following command:

df = df.astype(str).stack().str.split('
').str[-1].unstack() 
df = df.astype(str).stack().str.split('
').str[0].unstack()

However this line of code does not lead me to the desired results since the output is:

Sevanteri · Accepted Answer

df.replace and some regex.

In [1]: import pandas as pd
   ...: df = pd.DataFrame([
   ...:         ['
SOVAT
', 'DVR', 'MEA', '
195
'],
   ...:         ['PINCO
GALLO ', 'DVR', 'MEA
', '195'],
   ...:     ])
   ...:

In [2]: df.replace(r'.*
(.*)
?.*', r'\1', regex=True)
Out[3]:
        0    1    2    3
0   SOVAT  DVR  MEA  195
1  GALLO   DVR       195

How to partially remove content from cell in a dataframe using Python

Answers (1)

Related Questions