Reputation: 181

Pandas df split values out from one column into their own columns

How do I do this using Pandas and Python? I'm using Jupyter Notebook.

I have a dataframe like this:

    col
0   2017 / something
1   $5.91 (× 1)
2   Premium
3   2017 / anotherthing
4   $16.0 (× 1)
5   Business

And I want to split out values from there into their own columns like this:

    col                    revenue        plan
0   2017 / something       $5.91 (× 1)    Premium
1   2017 / anotherthing    $16.0 (× 1)    Business

And then delete the parenthesis values, and dollar signs from the revenue column so I end up with this:

    col                    revenue     plan
0   2017 / something       5.91        Premium
1   2017 / anotherthing    16.0        Business

Upvotes: 1

Answers (3)

piRSquared

Reputation: 294488

Using pd.MultiIndex.from_arrays and __divmod__
We use the value of 3 because we want the resulting 3 columns.

d = df.set_index(
    pd.MultiIndex.from_arrays(np.arange(len(df)).__divmod__(3))
).col.unstack().rename(columns={0: 'col', 1: 'revenue', 2: 'plan'})

d.assign(revenue=d.revenue.str.extract('\$(.*) \(', expand=False))

                   col revenue      plan
0     2017 / something    5.91   Premium
1  2017 / anotherthing    16.0  Business

Upvotes: 3

ak_slick

Reputation: 1016

Adapted other partial solutions to use cleaner solutions combined to get op's requested output.

# make dataframe 
df = pd.DataFrame(columns=['col'], data=['2017 / something', '$5.91 (× 1)', 'Premium', '2017 / anotherthing', '$16.0 (× 1)', 'Business'])

# break into 3 columns(per piRSquared's solution) and rename
df = df.set_index(
    pd.MultiIndex.from_arrays(np.arange(len(df)).__divmod__(3))
    ).col.unstack().rename(columns={0: 'col', 1: 'revenue', 2: 'plan'})

# strip parenthesis values and dollar signs
df.revenue = df.revenue.replace(r'\s*\([^\)]*\)', '', regex=True).str.strip('$')
print(df)

output:

                   col revenue      plan
0     2017 / something    5.91   Premium
1  2017 / anotherthing    16.0  Business

Upvotes: 0

MaxU - stand with Ukraine

Reputation: 210922

In [113]: (df[['col']].replace(r'\s*\([^\)]*\)', '', regex=True)
     ...:             .set_index(np.arange(len(df))//3)
     ...:             .set_index(np.arange(len(df))%3, append=True)['col']
     ...:             .unstack())
     ...:
Out[113]:
                     0      1         2
0     2017 / something  $5.91   Premium
1  2017 / anotherthing  $16.0  Business

Upvotes: 4

Pandas df split values out from one column into their own columns

Answers (3)

Related Questions