Reputation: 209
I am trying to get rid of all \n from a whole pandas dataframe. I know there already are answers on this on stack overflow but for some reasons I cant get the desired output. I have the following dataframe:
title text date authors
0 [ECB completes foreign reserves investment in ... [\nThe European Central Bank (ECB) completed an ... [13 June 2017] ECB
1 [Measures to improve the efficiency of the ope... [\nThe Governing Council of the ECB has decided ... [\n 23 January 2003 \n ] ECB
2 [] [] [] ECB
3 [ECB publishes the results of the Euro Money M... [Today the European Central Bank (ECB) is publ... [\n 28 September 2012 \n ] ECB
4 [] [] [] ECB
This is my desired output:
title text date authors
0 [ECB completes foreign reserves investment in... [The European Central Bank (ECB) completed an ... [13 June 2017] ECB
1 [Measures to improve the efficiency of the ope... [The Governing Council of the ECB has decided ... [23 January 2003] ECB
2 [] [] [] ECB
3 [ECB publishes the results of the Euro Money M... [Today the European Central Bank (ECB) is publ... [28 September 2012] ECB
4 [] [] [] ECB
These are all codes I tried:
based on this stack overflow post I tried:
mydf=df.replace({r'\\n': ''}, regex=True)
mydf=df['date'].str.strip(r'\\n') #this turns every obs into NaN
mydf=df.replace(to_replace=[r"\\n", "\n"], value=["",""], regex=True, inplace =True) #this gets rid of all data in dataframe for some reason
neither of which has worked
based on this post I tried (note I am skipping answers which were already tried previously):
mydf=df.replace(r'\s', '', regex = True, inplace = True) #this deleted all data
based on this post I tried:
mydf=df.replace('\\n',' ')
based on comments for this post I tried:
mydf=df['date'].replace(r'\s+|\\n', ' ', regex=True, inplace=True)
and
mydf=df.replace(r'\s+|\\n', ' ', regex=True, inplace=True)
based on answers in this post I tried:
mydf= df.replace({r'\s+$': '', r'^\s+': ''}, regex=True).replace(r'\n', ' ', regex=True)
mydf=df.replace({ r'\A\s+|\s+\Z': '', '\n' : ' '}, regex=True, inplace=True) # this again deleted whole df
I dont understand why the answers found there are not working in my case since they were accepted and most of those questions seem to be very similar to mine.
Upvotes: 3
Views: 1544
Reputation: 473
d = {'col1': [['\n a b c'], ['\n x y z']], 'col2': [[1.5000], ['\n x y z']]}
df20 = pd.DataFrame(data=d)
print(df20)
def remove_spec_char(string_list=list):
y = []
for string_x in string_list:
if type(string_x) == str:
y.append(string_x.replace('\n', ''))
else:
y.append(string_x)
return y
for c in df20.columns:
df20[c] = df20[c].apply(remove_spec_char)
print(df20)
Upvotes: 1
Reputation: 13377
Try:
df['date']=df['date'].str[0].str.replace(r"\n", "")
That is under the assumption, that each cell in date
column is a list with only 1 element. It will also flatten it - so you will get string from that single element.
IF however date
can contain more than one element, and you want to merge them all into single string after you get rid of all \n
- try
df['date']=df['date'].str.join('').str.replace(r"\n", "")
Otherwise, if you wish to keep it in the list format, just stripping all elements of \n
try (&&
being interim separator):
df['date']=df['date'].str.join(r'&&').str.replace(r"\n", "").str.split(r'&&')
Upvotes: 2