Reputation: 953
I applied a treatment in a column from an excel file. Now, I would like to export this treated column and all the others that were not treated as well.
My data (small exemple):
A B C
French house Phone <phone_numbers>
English house email [email protected]
French apartment my name is Liam
French house Hello George
English apartment Ethan, my phone is <phone_numbers>
My script:
import re
import pandas as pd
from pandas import Series
df = pd.read_excel('data.xlsx')
data = Series.to_string(df['C'])
def emails(data):
mails = re.compile(r'[\w\.-]+@[\w\.-]+')
replace_mails = mails.sub('<adresse_mail>', data)
return replace_mails
no_mails = emails(data)
no_mails.to_excel('new_data.xlsx')
My output:
AttributeError Traceback (most recent call last)
<ipython-input-7-8fd973998937> in <module>()
7
8 no_mails = emails(data)
----> 9 no_mails.to_excel('new_data.xlsx')
AttributeError: 'str' object has no attribute 'to_excel'
Good output:
A B C
French house Phone <phone_numbers>
English house email <adresse_mail>
French apartment my name is Liam
French house Hello George
English apartment Ethan, my phone is <phone_numbers>
My script works fine, only
no_mails.to_excel('new_data.xlsx')
does not seem to work.
Upvotes: 1
Views: 6014
Reputation: 461
Try this
no_mails= pd.DataFrame({'email' : []})
no_mails['email'] = emails(data)
no_mails.to_excel('new_data.xlsx')
Upvotes: 2
Reputation: 739
It looks like your function returns a String. You should transform it to a DataFrame.
If you want to do Regular Expression to a DataFrame, you should try this:
result = df['C'].str.findall(r'[\w\.-]+@[\w\.-]+')
writer = pd.ExcelWriter('new_data.xls')
result.to_excel(writer, 'Sheet 1')
writer.save()
Upvotes: 1
Reputation: 1352
to_excel
is a pandas data frame method doc. You should perform you substitution on data frame instead of on a column extracted as a string (like you did with: Series.to_string(df['C'])
).
Stick to data frames and you should be good.
Upvotes: 1
Reputation: 7430
You can use replace
on a pandas Series:
df['C'] = df['C'].str.replace(r'[\w\.-]+@[\w\.-]+','<adresse_mail>')
df.to_excel('new_data.xlsx')
Upvotes: 2