Reputation: 437
I have a dataframe with the following string format.
data.description[4000]=['Conduit, PVC Utility Type DB 60 TC-6, 1-1/2" LF .050 $.86 $1.90 $2.76']
the string varies in size but I would like be broken up splitting the string at the ' LF ' substring. The desired output would be
data2=['Conduit, PVC Utility Type DB 60 TC-6,1 -1/2"','LF',.050,'$.86','$1.90','$2.76]
If I were to have a list of units
units=['CLF','LF','EA']
How could I search the dataframe string and break the string in the aforementioned format? It seems splitting with unit delimiter would kinda work but I would lose the units. This gives me 2 strings which can be further split but it seems that it would require a row by row function.
Is there a better way to do this?
Upvotes: 1
Views: 488
Reputation: 12801
You can use the string method split
directly on the column with the text:
df['text'].str.split('(CLF|LF|EA)')
You can use capturing parentheses to keep the delimiter
Example:
units ='(CLF|LF|EA)'
df =pd.DataFrame({'text':['aaaaaaa LF bbbbbbbb','123456 CLF 78910','!!!!!!!! EA @@@@@@@@@@']})
df.text.str.split(units)
returns:
0 [aaaaaaa , LF, bbbbbbbb]
1 [123456 , CLF, 78910]
2 [!!!!!!!! , EA, @@@@@@@@@@]
Name: text, dtype: object
Upvotes: 1