Reputation: 181
How do I do this using Pandas and Python? I'm using Jupyter Notebook.
I have a dataframe like this:
col
0 2017 / something
1 $5.91 (× 1)
2 Premium
3 2017 / anotherthing
4 $16.0 (× 1)
5 Business
And I want to split out values from there into their own columns like this:
col revenue plan
0 2017 / something $5.91 (× 1) Premium
1 2017 / anotherthing $16.0 (× 1) Business
And then delete the parenthesis values, and dollar signs from the revenue column so I end up with this:
col revenue plan
0 2017 / something 5.91 Premium
1 2017 / anotherthing 16.0 Business
Upvotes: 1
Views: 201
Reputation: 294488
Using pd.MultiIndex.from_arrays
and __divmod__
We use the value of 3
because we want the resulting 3 columns.
d = df.set_index(
pd.MultiIndex.from_arrays(np.arange(len(df)).__divmod__(3))
).col.unstack().rename(columns={0: 'col', 1: 'revenue', 2: 'plan'})
d.assign(revenue=d.revenue.str.extract('\$(.*) \(', expand=False))
col revenue plan
0 2017 / something 5.91 Premium
1 2017 / anotherthing 16.0 Business
Upvotes: 3
Reputation: 1016
Adapted other partial solutions to use cleaner solutions combined to get op's requested output.
# make dataframe
df = pd.DataFrame(columns=['col'], data=['2017 / something', '$5.91 (× 1)', 'Premium', '2017 / anotherthing', '$16.0 (× 1)', 'Business'])
# break into 3 columns(per piRSquared's solution) and rename
df = df.set_index(
pd.MultiIndex.from_arrays(np.arange(len(df)).__divmod__(3))
).col.unstack().rename(columns={0: 'col', 1: 'revenue', 2: 'plan'})
# strip parenthesis values and dollar signs
df.revenue = df.revenue.replace(r'\s*\([^\)]*\)', '', regex=True).str.strip('$')
print(df)
output:
col revenue plan
0 2017 / something 5.91 Premium
1 2017 / anotherthing 16.0 Business
Upvotes: 0
Reputation: 210922
In [113]: (df[['col']].replace(r'\s*\([^\)]*\)', '', regex=True)
...: .set_index(np.arange(len(df))//3)
...: .set_index(np.arange(len(df))%3, append=True)['col']
...: .unstack())
...:
Out[113]:
0 1 2
0 2017 / something $5.91 Premium
1 2017 / anotherthing $16.0 Business
Upvotes: 4