Issue with removing
from pandas dataframe

Question

I am trying to get rid of all from a whole pandas dataframe. I know there already are answers on this on stack overflow but for some reasons I cant get the desired output. I have the following dataframe:

  title     text    date    authors
0   [ECB completes foreign reserves investment in ...   [
The European Central Bank (ECB) completed an ...     [13 June 2017]  ECB
1   [Measures to improve the efficiency of the ope...   [
The Governing Council of the ECB has decided ...     [
 23 January 2003 
 ]    ECB
2   []  []  []  ECB
3   [ECB publishes the results of the Euro Money M...   [Today the European Central Bank (ECB) is publ...   [
 28 September 2012 
 ]  ECB
4   []  []  []  ECB

This is my desired output:

title   text    date    authors
0   [ECB completes foreign reserves investment in...    [The European Central Bank (ECB) completed an ...   [13 June 2017]  ECB
1   [Measures to improve the efficiency of the ope...   [The Governing Council of the ECB has decided ...   [23 January 2003]   ECB
2   []  []  []  ECB
3   [ECB publishes the results of the Euro Money M...   [Today the European Central Bank (ECB) is publ...   [28 September 2012]     ECB
4   []  []  []  ECB

These are all codes I tried:

based on this stack overflow post I tried:

mydf=df.replace({r'\n': ''}, regex=True)

mydf=df['date'].str.strip(r'\n') #this turns every obs into NaN 

mydf=df.replace(to_replace=[r"\n", "
"], value=["",""], regex=True, inplace =True) #this gets rid of all data in dataframe for some reason

neither of which has worked

based on this post I tried (note I am skipping answers which were already tried previously):

mydf=df.replace(r'\s', '', regex = True, inplace = True) #this deleted all data
based on this post I tried:

mydf=df.replace('\n',' ')
based on comments for this post I tried:

mydf=df['date'].replace(r'\s+|\n', ' ', regex=True, inplace=True) and

mydf=df.replace(r'\s+|\n', ' ', regex=True, inplace=True)
based on answers in this post I tried:

mydf= df.replace({r'\s+$': '', r'^\s+': ''}, regex=True).replace(r' ', ' ', regex=True)

mydf=df.replace({ r'\A\s+|\s+\Z': '', ' ' : ' '}, regex=True, inplace=True) # this again deleted whole df

I dont understand why the answers found there are not working in my case since they were accepted and most of those questions seem to be very similar to mine.

Georgina Skibinski · Accepted Answer

Try:

df['date']=df['date'].str[0].str.replace(r"
", "")

That is under the assumption, that each cell in date column is a list with only 1 element. It will also flatten it - so you will get string from that single element.

IF however date can contain more than one element, and you want to merge them all into single string after you get rid of all - try

df['date']=df['date'].str.join('').str.replace(r"
", "")

Otherwise, if you wish to keep it in the list format, just stripping all elements of try (&& being interim separator):

df['date']=df['date'].str.join(r'&&').str.replace(r"
", "").str.split(r'&&')

Issue with removing \n from pandas dataframe

Answers (2)

Related Questions