LancelotHolmes
LancelotHolmes

Reputation: 699

python get and process value between certain characters in csv

Actually it's related to pandas, I have several records(about 40000) in a csv file, and one of the column value is market_value,for example:

Original records

player_id,player_name,club,age,market_value,date
3713,massimo-maccarone,UC Sampdoria,31,£2.55m,2011-01-10
3713,massimo-maccarone,UC Sampdoria,31,£1.70m,2011-06-21
3713,massimo-maccarone,FC Empoli,32,£850k,2012-01-01
3713,massimo-maccarone,FC Empoli,32,£680k,2012-06-15
...

here I'd like to extract the value of the market_value and convert it into some format like

player_id,player_name,club,age,market_value,date
3713,massimo-maccarone,UC Sampdoria,31,2550000,2011-01-10
3713,massimo-maccarone,UC Sampdoria,31,1700000,2011-06-21
3713,massimo-maccarone,FC Empoli,32,850000,2012-01-01
3713,massimo-maccarone,FC Empoli,32,680000,2012-06-15
...

which means I'd like to convert it into a numeric form and then import to database.And I have written a regex method to implement this like:

import re

def regex_format(test_str):
    if re.search('^£.+k$',test_str):
        return float(re.search('^£(.+)k$',test_str).group(1))*1000
    elif re.search('^£(.+)m$',test_str):
        return float(re.search('^£(.+)m$',test_str).group(1))*1000000
    else:
        return 0.0

it works for those string,but when I tried to apply it in csv file by pandas,I got a problem,my code is like:

import pandas as pd
import currency_format

df=pd.read_csv('the_market_value_o.csv')
df['market_value']=currency_format.regex_format(df['market_value'])
print df

and I got an exception like:

TypeError:expected string or buffer

and then I tried to convert the series to string by methods like:

df['market_value'].astype(basestring)

but it doesn't work.

BTW,my python version is Python 2.7.12 :: Anaconda 4.0.0 (64-bit) with pandas 0.18.0

Any help is appreciated,thank you.

Upvotes: 1

Views: 44

Answers (1)

Turix
Turix

Reputation: 4490

I think the problem is that df[market_value] is a Series (of strings) not a singleton string. Think of it as a slice/column of your data -- like a list of values. So instead try something like:

df['market_value'].map(currency_format.regex_format)

This answer here has more such functions/examples.

Upvotes: 1

Related Questions