marin
marin

Reputation: 953

Exporting data with pandas

I applied a treatment in a column from an excel file. Now, I would like to export this treated column and all the others that were not treated as well.

My data (small exemple):

       A          B                                    C
  French      house                Phone <phone_numbers>
 English      house            email [email protected]
  French  apartment                      my name is Liam
  French      house                         Hello George
 English  apartment   Ethan, my phone is <phone_numbers>

My script:

import re
import pandas as pd
from pandas import Series

df = pd.read_excel('data.xlsx')
data = Series.to_string(df['C'])

def emails(data):

    mails = re.compile(r'[\w\.-]+@[\w\.-]+')
    replace_mails = mails.sub('<adresse_mail>', data)

    return replace_mails

no_mails = emails(data)
no_mails.to_excel('new_data.xlsx')

My output:

AttributeError                            Traceback (most recent call last)
<ipython-input-7-8fd973998937> in <module>()
      7 
      8 no_mails = emails(data)
----> 9 no_mails.to_excel('new_data.xlsx')

AttributeError: 'str' object has no attribute 'to_excel'

Good output:

       A          B                                    C
  French      house                Phone <phone_numbers>
 English      house                 email <adresse_mail>
  French  apartment                      my name is Liam
  French      house                         Hello George
 English  apartment   Ethan, my phone is <phone_numbers>

My script works fine, only

no_mails.to_excel('new_data.xlsx')

does not seem to work.

Upvotes: 1

Views: 6014

Answers (4)

Malik Asad
Malik Asad

Reputation: 461

Try this

no_mails= pd.DataFrame({'email' : []}) no_mails['email'] = emails(data) no_mails.to_excel('new_data.xlsx')

Upvotes: 2

Snedecor
Snedecor

Reputation: 739

It looks like your function returns a String. You should transform it to a DataFrame.

If you want to do Regular Expression to a DataFrame, you should try this:

result = df['C'].str.findall(r'[\w\.-]+@[\w\.-]+')
writer = pd.ExcelWriter('new_data.xls')
result.to_excel(writer, 'Sheet 1')
writer.save()

Upvotes: 1

Tomasz Sabała
Tomasz Sabała

Reputation: 1352

to_excel is a pandas data frame method doc. You should perform you substitution on data frame instead of on a column extracted as a string (like you did with: Series.to_string(df['C'])).

Stick to data frames and you should be good.

Upvotes: 1

Franco Piccolo
Franco Piccolo

Reputation: 7430

You can use replace on a pandas Series:

df['C'] = df['C'].str.replace(r'[\w\.-]+@[\w\.-]+','<adresse_mail>')
df.to_excel('new_data.xlsx')

Upvotes: 2

Related Questions