Reputation: 45
I've a data frame which contains one column. Below is the example
Questionsbysortorder
Q1-4,Q2-3,Q3-2,Q4-3,Q5-3
Q1-1,Q2-2,Q3-1,Q4-1
Q1-5,Q2-3,Q3-3
I'm trying to explode the columns with the help of already given row values. Like below is the example
Questionsbysortorder Q1 Q2 Q3 Q4 Q5
Q1-4,Q2-3,Q3-2,Q4-3,Q5-3 4 3 2 3 3
Q1-1,Q2-2,Q3-1,Q4-1 1 2 1 1 NA
Q1-5,Q2-3,Q5-3 5 3 NA NA 3
Below is the code which i tried, but it's returning an error
import pandas as pd
import numpy as np
df = pd.DataFrame({'Questionsbysortorder': ['Q1-4,Q2-3,Q3-2,Q4-3,Q5-3', 'Q1-1,Q2-2,Q3-1,Q4-1','Q1-5,Q2-3,Q5-3']})
df['Questionsbysortorder'] = df['Questionsbysortorder'].str.split(',')
df = df.explode('Questionsbysortorder')
df['Questionsbysortorder'] = df['Questionsbysortorder'].str.split('-')
df = df.explode('Questionsbysortorder')
df = df.set_index('Questionsbysortorder').unstack().reset_index()
df.columns = ['Questionsbysortorder', 'value']
df = df.pivot(index='Questionsbysortorder', columns='value', values='Questionsbysortorder')
df.columns.name = None
print(df)
Error is:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-30dd8b8d4f59> in <module>()
14 df = df.set_index('Questionsbysortorder').unstack().reset_index()
15
---> 16 df.columns = ['Questionsbysortorder', 'value']
17
18 df = df.pivot(index='Questionsbysortorder', columns='value', values='Questionsbysortorder')
4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/base.py in _validate_set_axis(self, axis, new_labels)
56 elif new_len != old_len:
57 raise ValueError(
---> 58 f"Length mismatch: Expected axis has {old_len} elements, new "
59 f"values have {new_len} elements"
60 )
ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements
Can anyone please help me with this?
Upvotes: 2
Views: 58
Reputation: 1491
another way to split the column into multiple columns is using str.findall() method, here is an example:
res = df.Questionsbysortorder.str.findall(r'(Q\d+)-(\d+)').apply(lambda x: pd.Series(dict(x)))
print(res)
'''
Q1 Q2 Q3 Q4 Q5
0 4 3 2 3 3
1 1 2 1 1 NaN
2 5 3 NaN NaN 3
Upvotes: 1
Reputation: 150745
You are very close. You want to
','
,'-'
to get the different fieldsIn code:
df.join(df.Questionsbysortorder.str.split(',')
.explode()
.str.split('-', expand=True)
.set_index(0, append=True)[1]
.unstack()
)
Output:
Questionsbysortorder Q1 Q2 Q3 Q4 Q5
0 Q1-4,Q2-3,Q3-2,Q4-3,Q5-3 4 3 2 3 3
1 Q1-1,Q2-2,Q3-1,Q4-1 1 2 1 1 NaN
2 Q1-5,Q2-3,Q3-3 5 3 3 NaN NaN
Upvotes: 2