Reputation: 7713
I have a data in a format as shown below which can be genarated using the Python code given below
df = pd.DataFrame({'Person_id':[1,2,3,4],
'Values':['father:1.Yes 2.No 3.Do not Know','Mother:1.Yes 2.No 3.Do not
Know','sons:1.Yes 2.No 3.Do not Know','daughter:1.Yes 2.No 3.Do not Know'],
'Ethnicity':['dffather','dfmother','dfson','dfdaughter']})
What I would like to do is split the 'Values' cell content into 3 separate rows/ n seperate rows based on the number of values available in the cell. In this case, we have 3 options (1.Yes, 2.No and 3. Do not Know). I don't wish to retain the text like father, mother, son etc. I only wish to have the options
How can I get my output to be like as shown below
Please note that the options and values might differ in real time. What I have shown is a sample and there is no pattern that exists in terms of answer options.
Upvotes: 1
Views: 433
Reputation: 75080
One way can be (pandas):
s=df.Values.str.findall('(\D+)').str[1:]
m=(df.reindex(df.index.repeat(s.str.len()))
.assign(new_Value=np.concatenate(s.values)).drop('Values',1))
m.new_Value=m.groupby('Person_id').cumcount().add(1).astype(str)+m.new_Value
print(m)
Person_id Ethnicity new_Value
0 1 dffather 1.Yes
0 1 dffather 2.No
0 1 dffather 3.Do not Know
1 2 dfmother 1.Yes
1 2 dfmother 2.No
1 2 dfmother 3.Do not Know
2 3 dfson 1.Yes
2 3 dfson 2.No
2 3 dfson 3.Do not Know
3 4 dfdaughter 1.Yes
3 4 dfdaughter 2.No
3 4 dfdaughter 3.Do not Know
Upvotes: 1
Reputation: 2569
From a VBA perspective, this is one way you could do it:
Option Explicit
Sub splitVals()
Dim ws As Worksheet: Set ws = ActiveWorkbook.Sheets("Sheet Name")
Dim arrData As Variant, arrValues() As String
Dim arrTmp() As String: ReDim arrTmp(1 To 2, 1 To 1)
Dim arrFinal() As String
Dim lrow As Long: lrow = ws.Cells(Rows.Count, 1).End(xlUp).Row
Dim R As Long, C As Long, X As Long, Z As Long
arrData = ws.Range("A1:C" & lrow)
For R = LBound(arrData) + 1 To UBound(arrData)
arrData(R, 2) = Replace("df" & arrData(R, 2), arrData(R, 3) & ":", "")
arrValues = Split(arrData(R, 2), ".")
For X = LBound(arrValues) To UBound(arrValues)
If X + 1 = UBound(arrValues) Then
arrValues(X) = X + 1 & "." & arrValues(X + 1)
ReDim Preserve arrValues(X)
Exit For
Else
arrValues(X) = X + 1 & "." & Left(arrValues(X + 1), Len(arrValues(X + 1)) - 2)
End If
Next X
For X = LBound(arrValues) To UBound(arrValues)
Z = Z + 1
ReDim Preserve arrTmp(1 To 2, 1 To Z)
If X = 0 Then arrTmp(1, Z) = R - 1
arrTmp(2, Z) = arrValues(X)
Next X
Next R
ReDim arrFinal(LBound(arrTmp, 2) To UBound(arrTmp, 2), LBound(arrTmp) To UBound(arrTmp))
For R = LBound(arrFinal) To UBound(arrFinal)
For C = LBound(arrFinal, 2) To UBound(arrFinal, 2)
arrFinal(R, C) = arrTmp(C, R)
Next C
Next R
With ws.Range("E1")
.Resize(UBound(arrFinal), UBound(arrFinal, 2)) = arrFinal
End With
End Sub
Upvotes: 1