user3757265
user3757265

Reputation: 437

Using string methods on dataframes in Python Pandas?

I have a dataframe with the following string format.

data.description[4000]=['Conduit, PVC Utility Type DB 60 TC-6, 1-1/2"                                   LF   .050   $.86   $1.90   $2.76']

the string varies in size but I would like be broken up splitting the string at the ' LF ' substring. The desired output would be

data2=['Conduit, PVC Utility Type DB 60 TC-6,1 -1/2"','LF',.050,'$.86','$1.90','$2.76]

If I were to have a list of units

units=['CLF','LF','EA']

How could I search the dataframe string and break the string in the aforementioned format? It seems splitting with unit delimiter would kinda work but I would lose the units. This gives me 2 strings which can be further split but it seems that it would require a row by row function.

Is there a better way to do this?

Upvotes: 1

Views: 488

Answers (1)

JAB
JAB

Reputation: 12801

You can use the string method split directly on the column with the text:

df['text'].str.split('(CLF|LF|EA)')

You can use capturing parentheses to keep the delimiter

Example:

units ='(CLF|LF|EA)'
df =pd.DataFrame({'text':['aaaaaaa LF bbbbbbbb','123456 CLF 78910','!!!!!!!! EA @@@@@@@@@@']})
df.text.str.split(units)

returns:

0       [aaaaaaa , LF,  bbbbbbbb]
1          [123456 , CLF,  78910]
2    [!!!!!!!! , EA,  @@@@@@@@@@]
Name: text, dtype: object

Upvotes: 1

Related Questions