Reputation: 91
Picture of the DataFrame:
I have one column named contracting and another named contractor inside a DataFrame.
I need to divide, for example, the column contractor, into 2 new columns: one column containing the Fiscal number that is inside the parenthesis and another column containing all the rest (the description).
Example:
Contractor: Meo(504615947)
I need that it becomes:
Contractor_Name: Meo and Contractor_Number:504615947
I tried to do this:
proc_2013[['contractor_description', 'contractor_NIF']]= pd.DataFrame(proc_2013['contractor'].str.split(('('),1).tolist())
proc2013['contractor_NIF'] = proc2013.contractor_NIF.str.extract('(\d+)')
Problem 1:
I can have a name description inside a parenthesis as well, followed by the number that I am trying to extract.
Problem 2:
Sometimes, if the contractor is from a foreign country, it has a letter in the beginning of the Fiscal Number (not only numbers as I assumed at first, using my second line of code).
All Fiscal Numbers have 9 digits.
Upvotes: 2
Views: 161
Reputation: 1074
As far as i could understand your question, this can be a possible solution,
df['contractor_name']=list(map(lambda x : x.split('(')[0],df['con']))
df['contractor_number']=list(map(lambda x : x.split('(')[-1][-10:-1],df['contractor']))
Hope this helps.
Upvotes: 2
Reputation: 7420
You could change \d
to \w
for any alphanumeric like:
proc2013['contractor_NIF'] = proc2013.contractor_NIF.str.extract('\((\w+)\)')
Upvotes: 2