Kitt
Kitt

Reputation: 15

Python - Dataframe - split column data into new columns - explain what some of code means

I am learning dataframes and wanted to breakup one column into new columns. I accomplished it with the 3 lines of code below (probably could be accomplished in one line but I wasn't sure how) by try and error, but I don't really understand some parts of the code I wrote. I was hoping someone can explain what the "1" and "2" in the split and ".str[1]" and ".str[2]" at the end means. Thanks

DataRow:
Customer 1234M01 123 BurOak St, 823-123-4567
Customer 5678M02 567 Young St, 819-1234567

Py_Cust['TEMP']=Py_Cust.DataRow.str.split('Customer ', 1).str[1]
Py_Cust['ID']=Py_Cust.TEMP.str.split(' ', 2).str[1]
Py_Cust['ADDR']=Py_Cust.TEMP.str.split(' ', 2).str[2]

Upvotes: 0

Views: 44

Answers (1)

zach bredl
zach bredl

Reputation: 36

The '1' and '2' refer to the number of splits you are asking for. So, for example, when you did

Py_Cust['TEMP']=Py_Cust.DataRow.str.split('Customer ', 1).str[1]

it split your data into two strings.

the .str[1] or .str[2] part refers to which item of the new strings you have created you want to assign to that column. Basically whenever you see square brackets in python [ ], you are indexing an object. For example, str[1] takes the second item. Referencing the line above, since you split on 'Customer', str[1] will take the item after that, or the rest of the string.

"Customer 1234M01 123 BurOak St, 823-123-4567".split("Customer", 1)
>>> ["Customer", "1234M01 123 BurOak St, 823-123-4567"]

Here is some documentation for working with strings in general and here is the pandas documentation for working with stings and dataframes

Upvotes: 1

Related Questions