Reputation: 83
I would like to split a column into multiple columns in my data frame. It is separated with commas.
I would like to apply something like 'text to columns' function in excel.
I will give my own headings after I split the columns. 'Turnstile' is the name of my column. I have:
(A006, R079, 00-00-04, 5 AVE-59 ST)
types of data in each row. In the end I would like to have:
A006 R079 00-00-04 5 AVE-59 ST
with the headings I will create.
I lastly tried:
df.Turnstile.str.split().tolist()
But all i have is 'nan'
When I check the type of 'Turnstile' column, it says 'object. I tried to convert that series into string with:
df['Turnstile'] = df[['Turnstile'].astype(str)]
but it gives me:
AttributeError: 'list' object has no attribute 'astype'
Please advise.
Thank you.
Upvotes: 2
Views: 12601
Reputation: 1102
Maybe another way of looking at this is converting a column of tuples to a DataFrame
, like so:
In [10]: DataFrame(df['Turnstile'].tolist())
Out[10]:
0 1 2 3
0 A006 R079 00-00-04 5 AVE-59 ST
1 A006 R079 00-00-04 5 AVE-59 ST
2 A006 R079 00-00-04 5 AVE-59 ST
3 A006 R079 00-00-04 5 AVE-59 ST
4 A006 R079 00-00-04 5 AVE-59 ST
5 A006 R079 00-00-04 5 AVE-59 ST
6 A006 R079 00-00-04 5 AVE-59 ST
7 A006 R079 00-00-04 5 AVE-59 ST
8 A006 R079 00-00-04 5 AVE-59 ST
9 A006 R079 00-00-04 5 AVE-59 ST
If that's the case, here's an example that converts the column of tuples to a DataFrame
and adds it back to the original dataframe:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
# create a fake dataframe, repeating the tuple given in the example
In [2]: df = DataFrame(data={'Observations': np.random.randn(10) * np.arange(10),
...: 'Turnstile': (('A006', 'R079', '00-00-04', '5 AVE-59 ST'),)*10})
In [3]: df.head()
Out[3]:
Observations Turnstile
0 -0.000000 (A006, R079, 00-00-04, 5 AVE-59 ST)
1 -0.022668 (A006, R079, 00-00-04, 5 AVE-59 ST)
2 -2.380515 (A006, R079, 00-00-04, 5 AVE-59 ST)
3 -4.209983 (A006, R079, 00-00-04, 5 AVE-59 ST)
4 3.932902 (A006, R079, 00-00-04, 5 AVE-59 ST)
# all at once turn the column of tuples into a dataframe and concat that with the original df
In [4]: df = pd.concat([df,DataFrame(df['Turnstile'].tolist())], axis=1, join='outer')
In [5]: df.head()
Out[5]:
Observations Turnstile 0 1 2 \
0 -0.000000 (A006, R079, 00-00-04, 5 AVE-59 ST) A006 R079 00-00-04
1 -0.022668 (A006, R079, 00-00-04, 5 AVE-59 ST) A006 R079 00-00-04
2 -2.380515 (A006, R079, 00-00-04, 5 AVE-59 ST) A006 R079 00-00-04
3 -4.209983 (A006, R079, 00-00-04, 5 AVE-59 ST) A006 R079 00-00-04
4 3.932902 (A006, R079, 00-00-04, 5 AVE-59 ST) A006 R079 00-00-04
3
0 5 AVE-59 ST
1 5 AVE-59 ST
2 5 AVE-59 ST
3 5 AVE-59 ST
4 5 AVE-59 ST
# i assume you don't need this column anymore
In [6]: del df['Turnstile']
If that works you can of course name the new columns as needed.
Upvotes: 3
Reputation: 36555
Couple options here, if your data is in true csv format, say as an export from Excel, you can use pandas.read_csv
to read in the file, and it will automatically be split into columns based on the column delimiters.
If your data is a string column with commas, you can use str.split
to redefine your columns, but as far as I know, you need to dump the resulting column as a raw Python list and then recast as dataframe:
import pandas as pd
df = pd.DataFrame([["A006, R079, 00-00-04, 5 AVE-59 ST"]])
df2 = pd.DataFrame(df[0].str.split(',').tolist())
Upvotes: 0