Federico Gentile
Federico Gentile

Reputation: 5940

How to partially remove content from cell in a dataframe using Python

I have the following dataframe:

import pandas as pd    
df = pd.DataFrame([
        ['\nSOVAT\n', 'DVR', 'MEA', '\n195\n'],
        ['PINCO\nGALLO ', 'DVR', 'MEA\n', '195'],
    ])

which looks like this:

enter image description here

My goal is to analyze every single cell of the dataframe so that:

The output of the code should be this:

enter image description here

Notice: so far I only know how to remove the what comes before or after the substring by using the following command:

df = df.astype(str).stack().str.split('\n').str[-1].unstack() 
df = df.astype(str).stack().str.split('\n').str[0].unstack() 

However this line of code does not lead me to the desired results since the output is:

enter image description here

Upvotes: 1

Views: 663

Answers (1)

Sevanteri
Sevanteri

Reputation: 4038

df.replace and some regex.

In [1]: import pandas as pd
   ...: df = pd.DataFrame([
   ...:         ['\nSOVAT\n', 'DVR', 'MEA', '\n195\n'],
   ...:         ['PINCO\nGALLO ', 'DVR', 'MEA\n', '195'],
   ...:     ])
   ...:

In [2]: df.replace(r'.*\n(.*)\n?.*', r'\1', regex=True)
Out[3]:
        0    1    2    3
0   SOVAT  DVR  MEA  195
1  GALLO   DVR       195

Upvotes: 2

Related Questions