Ahmed
Ahmed

Reputation: 119

how to extract text strings from a cell

i have text content in the first cell of each row. it is basically a paragraph. i want to split this paragraph into different cells, in the same row.

Definition: Photosynthesis is a process used by plants and other organisms to convert light energy, normally from the Sun, into chemical energy that can be later released to fuel the organisms activities. Working: In photosynthetic bacteria, the proteins that gather light for photosynthesis are embedded in cell membranes. In its simplest form, this involves the membrane surrounding the cell itself. Evolution: Early photosynthetic systems, such as those from green and purple sulfur and green and purple nonsulfur bacteria, are thought to have been anoxygenic, using various molecules as electron donors.

this content is present in cell a1. i want to split it into 3 cells

cell a2

"Definition: Photosynthesis is a process used by plants and other organisms to convert light energy, normally from the Sun, into chemical energy that can be later released to fuel the organisms activities."

cell a3

"Working: In photosynthetic bacteria, the proteins that gather light for photosynthesis are embedded in cell membranes. In its simplest form, this involves the membrane surrounding the cell itself."

cell a4

"Evolution: Early photosynthetic systems, such as those from green and purple sulfur and green and purple nonsulfur bacteria, are thought to have been anoxygenic, using various molecules as electron donors."

first cell goes on from text Definition: ---- Working: (this includes the text "Definition:" and does not include "Working:") second cell goes on from Working: ------- Evolution: (this includes the text "Working:" and does not include "Evolution:") third cell from Evolution: ------- endof string.

Upvotes: 0

Views: 463

Answers (4)

bjbk
bjbk

Reputation: 418

Using FIND() and MID() you should be able to parse this.

Disclaimer: The solution below assumes that you will always have the strings "Definition", "Working", and "Evolution" in each paragraph you wish to parse.

FIND returns a number based on the position of the character starting from the designated position, in this case, the first character, 1.

In A2 use this formula:

=LEFT(A1,FIND("Working:",A1,1)−1)

This searches for "Working:" and returns its position. In this case character number 206. The -1 omits the "W" in working from being returned as it is character #206. We want to end at the character before "W".

In A3 use this formula:

=MID(A1,FIND("Working:",A1,1),FIND("Evolution:",A1,1)-FIND("Working:",A1,1))

Now we are looking for a start and end character position. We have to use subtraction to get the string length.

In A4 use this formula:

=MID(A1,FIND("Evolution:",A1,1),FIND("Working:",A1,1))

Middle requires a start and end character position.

Bonus: to eliminate any extra white space at the beginning or end, wrap the formulas in the TRIM() function. Exampple: =TRIM(MID(A1,FIND("Working:",A1,1),FIND("Evolution:",A1,1)-FIND("Working:",A1,1)))

Result of Parse

Upvotes: 2

paul bica
paul bica

Reputation: 10705

Another VBA solution:

Option Explicit

Public Sub extractPara()
    Dim lRow As Long, ur As Variant, arr As Variant
    Dim i As Long, j As Long, x2 As Long, x3 As Long

    With ActiveSheet
        lRow = .Range("A" & .Rows.Count).End(xlUp).Row: j = 1
        ur = .Range("A1:A" & lRow).Value2
        arr = .Range("A1:A" & lRow * 4).Value2
        For i = 1 To lRow
            x2 = InStr(1, ur(i, 1), "Working:", vbBinaryCompare)
            x3 = InStr(1, ur(i, 1), "Evolution:", vbBinaryCompare)
            arr(j + 0, 1) = ur(i, 1)
            arr(j + 1, 1) = Left(ur(i, 1), x2)
            arr(j + 2, 1) = Mid(ur(i, 1), x2, x3)
            arr(j + 3, 1) = Mid(ur(i, 1), x3)
            j = j + 4
        Next
        .Range("A1:A" & lRow * 4) = arr
    End With
End Sub

Upvotes: 2

Dan Donoghue
Dan Donoghue

Reputation: 6216

Being that you tagged this as Excel-vba also I thought I would post a VBA solution for you. This UDF hopefully covers your needs.

Function SplitString(MyString As String, PartNum As Long)
Dim MyNewString As String, DefNames As Variant, DefNamesOn As Boolean
DefNamesOn = True
DefNames = Array("Definition: ", "Working: ", "Evolution: ")
MyNewString = Split(MyString, ":")(PartNum)
If DefNamesOn Then
    SplitString = DefNames(PartNum - 1)
Else
    SplitString = ""
End If
If Right(UCase(MyNewString), 8) = " WORKING" Then
    SplitString = SplitString & Trim(Left(MyNewString, Len(MyNewString) - 8))
ElseIf Right(UCase(MyNewString), 10) = " EVOLUTION" Then
    SplitString = SplitString & Trim(Left(MyNewString, Len(MyNewString) - 10))
Else
    SplitString = SplitString & Trim(MyNewString)
End If
End Function

You can change DefNamesOn = True to False if you don't want "Definition: ", "Working: " or "Evolution: " at the start

Use it like any other formula: =SplitString(A1,1) where A1 has your string and 1 is the number of the part you want (1, 2 or 3)

You can chop a bit out of the code if you always want DefNames on or off but I thought it better to give you the option in the code.

Upvotes: 2

Ahmed
Ahmed

Reputation: 119

googled alot about this. got something that worked for me.

=MID(A1,SEARCH("Definition:",A1),SEARCH("Working:",A1)-SEARCH("Definition:",A1))

this got me the result in cell a2

Working: In photosynthetic bacteria, the proteins that gather light for photosynthesis are embedded in cell membranes. In its simplest form, this involves the membrane surrounding the cell itself.

Upvotes: 0

Related Questions