The Great
The Great

Reputation: 7713

Split a single excel cell content into different cells using Python or Excel vba

I have a data in a format as shown below which can be genarated using the Python code given below

df = pd.DataFrame({'Person_id':[1,2,3,4],
'Values':['father:1.Yes 2.No 3.Do not Know','Mother:1.Yes 2.No 3.Do not 
 Know','sons:1.Yes 2.No 3.Do not Know','daughter:1.Yes 2.No 3.Do not Know'],
   'Ethnicity':['dffather','dfmother','dfson','dfdaughter']})

enter image description here

What I would like to do is split the 'Values' cell content into 3 separate rows/ n seperate rows based on the number of values available in the cell. In this case, we have 3 options (1.Yes, 2.No and 3. Do not Know). I don't wish to retain the text like father, mother, son etc. I only wish to have the options

How can I get my output to be like as shown below

enter image description here

Please note that the options and values might differ in real time. What I have shown is a sample and there is no pattern that exists in terms of answer options.

Upvotes: 1

Views: 433

Answers (2)

anky
anky

Reputation: 75080

One way can be (pandas):

s=df.Values.str.findall('(\D+)').str[1:]
m=(df.reindex(df.index.repeat(s.str.len()))
   .assign(new_Value=np.concatenate(s.values)).drop('Values',1))
m.new_Value=m.groupby('Person_id').cumcount().add(1).astype(str)+m.new_Value
print(m)

   Person_id   Ethnicity      new_Value
0          1    dffather         1.Yes 
0          1    dffather          2.No 
0          1    dffather  3.Do not Know
1          2    dfmother         1.Yes 
1          2    dfmother          2.No 
1          2    dfmother  3.Do not Know
2          3       dfson         1.Yes 
2          3       dfson          2.No 
2          3       dfson  3.Do not Know
3          4  dfdaughter         1.Yes 
3          4  dfdaughter          2.No 
3          4  dfdaughter  3.Do not Know

Upvotes: 1

FAB
FAB

Reputation: 2569

From a VBA perspective, this is one way you could do it:

Option Explicit

Sub splitVals()

Dim ws As Worksheet: Set ws = ActiveWorkbook.Sheets("Sheet Name")
Dim arrData As Variant, arrValues() As String
Dim arrTmp() As String: ReDim arrTmp(1 To 2, 1 To 1)
Dim arrFinal() As String

Dim lrow As Long: lrow = ws.Cells(Rows.Count, 1).End(xlUp).Row
Dim R As Long, C As Long, X As Long, Z As Long

arrData = ws.Range("A1:C" & lrow)

For R = LBound(arrData) + 1 To UBound(arrData)
    arrData(R, 2) = Replace("df" & arrData(R, 2), arrData(R, 3) & ":", "")
    arrValues = Split(arrData(R, 2), ".")
    For X = LBound(arrValues) To UBound(arrValues)
        If X + 1 = UBound(arrValues) Then
            arrValues(X) = X + 1 & "." & arrValues(X + 1)
            ReDim Preserve arrValues(X)
            Exit For
        Else
            arrValues(X) = X + 1 & "." & Left(arrValues(X + 1), Len(arrValues(X + 1)) - 2)
        End If
    Next X

    For X = LBound(arrValues) To UBound(arrValues)
        Z = Z + 1
        ReDim Preserve arrTmp(1 To 2, 1 To Z)
        If X = 0 Then arrTmp(1, Z) = R - 1
        arrTmp(2, Z) = arrValues(X)
    Next X
Next R

ReDim arrFinal(LBound(arrTmp, 2) To UBound(arrTmp, 2), LBound(arrTmp) To UBound(arrTmp))
For R = LBound(arrFinal) To UBound(arrFinal)
    For C = LBound(arrFinal, 2) To UBound(arrFinal, 2)
        arrFinal(R, C) = arrTmp(C, R)
    Next C
Next R

With ws.Range("E1")
    .Resize(UBound(arrFinal), UBound(arrFinal, 2)) = arrFinal
End With

End Sub

And the result: enter image description here

Upvotes: 1

Related Questions