jroc
jroc

Reputation: 91

How to extract the characters from a string that are inside parentheses?

Picture of the DataFrame:

I have one column named contracting and another named contractor inside a DataFrame.

I need to divide, for example, the column contractor, into 2 new columns: one column containing the Fiscal number that is inside the parenthesis and another column containing all the rest (the description).

Example:

Contractor: Meo(504615947)

I need that it becomes:

Contractor_Name: Meo and Contractor_Number:504615947

I tried to do this:

proc_2013[['contractor_description', 'contractor_NIF']]= pd.DataFrame(proc_2013['contractor'].str.split(('('),1).tolist())

proc2013['contractor_NIF'] = proc2013.contractor_NIF.str.extract('(\d+)')  

Problem 1:

I can have a name description inside a parenthesis as well, followed by the number that I am trying to extract.

Problem 2:

Sometimes, if the contractor is from a foreign country, it has a letter in the beginning of the Fiscal Number (not only numbers as I assumed at first, using my second line of code).

All Fiscal Numbers have 9 digits.

Upvotes: 2

Views: 161

Answers (2)

Akash Ranjan
Akash Ranjan

Reputation: 1074

As far as i could understand your question, this can be a possible solution,

df['contractor_name']=list(map(lambda x : x.split('(')[0],df['con']))
df['contractor_number']=list(map(lambda x : x.split('(')[-1][-10:-1],df['contractor']))

Hope this helps.

Upvotes: 2

Franco Piccolo
Franco Piccolo

Reputation: 7420

You could change \d to \w for any alphanumeric like:

proc2013['contractor_NIF'] = proc2013.contractor_NIF.str.extract('\((\w+)\)')  

Upvotes: 2

Related Questions