Reputation: 5940
I have the following dataframe:
import pandas as pd
df = pd.DataFrame([
['\nSOVAT\n', 'DVR', 'MEA', '\n195\n'],
['PINCO\nGALLO ', 'DVR', 'MEA\n', '195'],
])
which looks like this:
My goal is to analyze every single cell of the dataframe so that:
\n
appears only once, then I delete it along with all the characters that come before it;\n
appears more than once in a specific cell, then I remove all the \n
contained along with what comes before and after them (except for what is in between)The output of the code should be this:
Notice: so far I only know how to remove the what comes before or after the substring by using the following command:
df = df.astype(str).stack().str.split('\n').str[-1].unstack()
df = df.astype(str).stack().str.split('\n').str[0].unstack()
However this line of code does not lead me to the desired results since the output is:
Upvotes: 1
Views: 663
Reputation: 4038
df.replace
and some regex.
In [1]: import pandas as pd
...: df = pd.DataFrame([
...: ['\nSOVAT\n', 'DVR', 'MEA', '\n195\n'],
...: ['PINCO\nGALLO ', 'DVR', 'MEA\n', '195'],
...: ])
...:
In [2]: df.replace(r'.*\n(.*)\n?.*', r'\1', regex=True)
Out[3]:
0 1 2 3
0 SOVAT DVR MEA 195
1 GALLO DVR 195
Upvotes: 2