Reputation: 15
I am learning dataframes and wanted to breakup one column into new columns. I accomplished it with the 3 lines of code below (probably could be accomplished in one line but I wasn't sure how) by try and error, but I don't really understand some parts of the code I wrote. I was hoping someone can explain what the "1" and "2" in the split and ".str[1]" and ".str[2]" at the end means. Thanks
DataRow:
Customer 1234M01 123 BurOak St, 823-123-4567
Customer 5678M02 567 Young St, 819-1234567
Py_Cust['TEMP']=Py_Cust.DataRow.str.split('Customer ', 1).str[1]
Py_Cust['ID']=Py_Cust.TEMP.str.split(' ', 2).str[1]
Py_Cust['ADDR']=Py_Cust.TEMP.str.split(' ', 2).str[2]
Upvotes: 0
Views: 44
Reputation: 36
The '1' and '2' refer to the number of splits you are asking for. So, for example, when you did
Py_Cust['TEMP']=Py_Cust.DataRow.str.split('Customer ', 1).str[1]
it split your data into two strings.
the .str[1]
or .str[2]
part refers to which item of the new strings you have created you want to assign to that column. Basically whenever you see square brackets in python [ ]
, you are indexing an object. For example, str[1]
takes the second item. Referencing the line above, since you split on 'Customer'
, str[1]
will take the item after that, or the rest of the string.
"Customer 1234M01 123 BurOak St, 823-123-4567".split("Customer", 1)
>>> ["Customer", "1234M01 123 BurOak St, 823-123-4567"]
Here is some documentation for working with strings in general and here is the pandas documentation for working with stings and dataframes
Upvotes: 1