lorelai
lorelai

Reputation: 83

Splitting one column into multiple columns with python pandas

I would like to split a column into multiple columns in my data frame. It is separated with commas.

I would like to apply something like 'text to columns' function in excel.

I will give my own headings after I split the columns. 'Turnstile' is the name of my column. I have:

(A006, R079, 00-00-04, 5 AVE-59 ST)

types of data in each row. In the end I would like to have:

A006    R079   00-00-04   5 AVE-59 ST

with the headings I will create.

I lastly tried:

df.Turnstile.str.split().tolist()

But all i have is 'nan'

When I check the type of 'Turnstile' column, it says 'object. I tried to convert that series into string with:

df['Turnstile'] = df[['Turnstile'].astype(str)]

but it gives me:

AttributeError: 'list' object has no attribute 'astype'

Please advise.

Thank you.

Upvotes: 2

Views: 12601

Answers (3)

lightyagami96
lightyagami96

Reputation: 336

try doing df.Turnstile.str.split(',')

Upvotes: 0

measureallthethings
measureallthethings

Reputation: 1102

Maybe another way of looking at this is converting a column of tuples to a DataFrame, like so:

In [10]: DataFrame(df['Turnstile'].tolist())
Out[10]:
      0     1         2            3
0  A006  R079  00-00-04  5 AVE-59 ST
1  A006  R079  00-00-04  5 AVE-59 ST
2  A006  R079  00-00-04  5 AVE-59 ST
3  A006  R079  00-00-04  5 AVE-59 ST
4  A006  R079  00-00-04  5 AVE-59 ST
5  A006  R079  00-00-04  5 AVE-59 ST
6  A006  R079  00-00-04  5 AVE-59 ST
7  A006  R079  00-00-04  5 AVE-59 ST
8  A006  R079  00-00-04  5 AVE-59 ST
9  A006  R079  00-00-04  5 AVE-59 ST

If that's the case, here's an example that converts the column of tuples to a DataFrame and adds it back to the original dataframe:

import numpy as np
import pandas as pd
from pandas import Series, DataFrame

# create a fake dataframe, repeating the tuple given in the example
In [2]: df = DataFrame(data={'Observations': np.random.randn(10) * np.arange(10),
...:     'Turnstile': (('A006', 'R079', '00-00-04', '5 AVE-59 ST'),)*10})

In [3]: df.head()
Out[3]:
   Observations                            Turnstile
0     -0.000000  (A006, R079, 00-00-04, 5 AVE-59 ST)
1     -0.022668  (A006, R079, 00-00-04, 5 AVE-59 ST)
2     -2.380515  (A006, R079, 00-00-04, 5 AVE-59 ST)
3     -4.209983  (A006, R079, 00-00-04, 5 AVE-59 ST)
4      3.932902  (A006, R079, 00-00-04, 5 AVE-59 ST)

# all at once turn the column of tuples into a dataframe and concat that with the original df
In [4]: df = pd.concat([df,DataFrame(df['Turnstile'].tolist())], axis=1, join='outer')

In [5]: df.head()
Out[5]:
       Observations                            Turnstile     0     1         2  \
    0     -0.000000  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    1     -0.022668  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    2     -2.380515  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    3     -4.209983  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04
    4      3.932902  (A006, R079, 00-00-04, 5 AVE-59 ST)  A006  R079  00-00-04

         3
0  5 AVE-59 ST
1  5 AVE-59 ST
2  5 AVE-59 ST
3  5 AVE-59 ST
4  5 AVE-59 ST

# i assume you don't need this column anymore
In [6]: del df['Turnstile']

If that works you can of course name the new columns as needed.

Upvotes: 3

maxymoo
maxymoo

Reputation: 36555

Couple options here, if your data is in true csv format, say as an export from Excel, you can use pandas.read_csv to read in the file, and it will automatically be split into columns based on the column delimiters.

If your data is a string column with commas, you can use str.split to redefine your columns, but as far as I know, you need to dump the resulting column as a raw Python list and then recast as dataframe:

import pandas as pd
df = pd.DataFrame([["A006, R079, 00-00-04, 5 AVE-59 ST"]])
df2 = pd.DataFrame(df[0].str.split(',').tolist())

Upvotes: 0

Related Questions