Python pandas.read_csv split column into multiple new columns using comma to separate

Question

I've used pandas.read_csv to load in a file.

I've stored the file into a variable. The first column is a series of numbers separated by a comma (,) I want to split these numbers, and put each number to a new column.

I can't seem to find the write functionality for pandas.dataframe.

Side Note I would prefer a different library for loading in my file, but pandas provides some other different functionality which I need.

My Code:

Data = pandas.read_csv(pathFile,header=None)

doing: print Data gives me:

   0                          1         2          ...
0 [2014, 8, 26, 5, 30, 0.0]   0         0.25       ...

(as you can see its a date)

Question: How to split/separate each number and save it in a new array

p.s. I'm trying to achieve the same thing the matlab method datevec() does

unutbu · Accepted Answer

If the CSV data looks like

"[2014, 8, 26, 5, 30, 0.0]",0,0.25

then

import pandas as pd
import json

df = pd.read_csv('data', header=None)
dates, df = df[0], df.iloc[:, 1:]
df = pd.concat([df, dates.apply(lambda x: pd.Series(json.loads(x)))], axis=1,
               ignore_index=True)
print(df)

yields

   0     1     2  3   4  5   6  7
0  0  0.25  2014  8  26  5  30  0

with the values parsed as numeric values.

How it works:

dates, df = df[0], df.iloc[:, 1:]

peels off the first column, and reassigns df to the rest of the DataFrame:

In [217]: dates
Out[217]: 
0    [2014, 8, 26, 5, 30, 0.0]
Name: 0, dtype: object

dates contains strings:

In [218]: dates.iloc[0]
Out[218]: '[2014, 8, 26, 5, 30, 0.0]'

We can convert these to a list using json.loads:

In [219]: import json

In [220]: json.loads(dates.iloc[0])
Out[220]: [2014, 8, 26, 5, 30, 0.0]

In [221]: type(json.loads(dates.iloc[0]))
Out[221]: list

We can do this for each row of dates by using apply:

In [222]: dates.apply(lambda x: pd.Series(json.loads(x)))
Out[222]: 
      0  1   2  3   4  5
0  2014  8  26  5  30  0

By making lambda, above, return a Series, apply will return a DataFrame, with the index of the Series becoming the column index of the DataFrame.

Now we can use pd.concat to concatenate this DataFrame with df:

In [228]: df = pd.concat([df, dates.apply(lambda x: pd.Series(json.loads(x)))], axis=1, ignore_index=True)

In [229]: df
Out[229]: 
   0     1     2  3   4  5   6  7
0  0  0.25  2014  8  26  5  30  0

In [230]: df.dtypes
Out[230]: 
0      int64
1    float64
2    float64
3    float64
4    float64
5    float64
6    float64
7    float64
dtype: object

Python pandas.read_csv split column into multiple new columns using comma to separate

Answers (2)

Related Questions