Al Merchant
Al Merchant

Reputation: 107

split columns and name them using pandas

I want to split a column into 3 based on a delimiter ":" and I am able to do that using the code below. But now want to change the names of the split columns from the default of 1,2,3,4. Please advice how I can do that.

from pandas import *
df = DataFrame(
     {'CustomerName' : ["Paul", "John"], 'Seatblocks' : ["2:218:10:4,6","2:218:10:4,6"]}); 
df

df.join(df.Seatblocks.apply(lambda x: Series(x.split(':'))))

Upvotes: 4

Views: 13804

Answers (6)

yohoo
yohoo

Reputation: 193

Here is a small function for automatically adding column names

def col_splitter(df, orig_col, deli):
    #Split the column on the delimiter provided
    splitted=df[orig_col].str.split(deli, expand=True)
    
    # make a dictionary of new column names based on the number of splits
    col_names={k:f'{orig_col}_{k}' for k in splitted.columns.values}

    # Lastly, rename columns ==> 0:'Seatblocks_0', 1:'Seatblocks_1' etc
    splitted.rename(columns=col_names,inplace=True)
    return splitted



col_splitter(df,'Seatblocks',':')

Upvotes: 0

GaboZillion
GaboZillion

Reputation: 11

This may help:

import pandas as pd
import numpy as np

df = pd.DataFrame(
     {'CustomerName' : ["Paul", "John"], 'Seatblocks' : ["2:218:10:4,6","2:218:10:4,6"]}); 
#df
#1.split, rename
df_extraction = df['Seatblocks'].str.split(':', expand=True)
df_extraction.columns = ['seatBlock'+str(i) for i in df_extraction.columns]

# 2.merge extracted to df
df = pd.concat([df, df_extraction], axis=1)
df

CustomerName    Seatblocks  seatBlock0  seatBlock1  seatBlock2  seatBlock3
0   Paul    2:218:10:4,6    2   218     10  4,6
1   John    2:218:10:4,6    2   218     10  4,6

Upvotes: 1

Vor
Vor

Reputation: 35109

You can modify df.column to get the new column names

In [1]: from pandas import *

In [2]: df = DataFrame(
   ...:      {'CustomerName' : ["Paul", "John"], 'Seatblocks' : ["2:218:10:4,6","2:218:10:4,6"]}); 

In [3]: df2 = df.join(df.Seatblocks.apply(lambda x: Series(x.split(':'))))

In [4]: names = ['foo', 'bar', 'baz', 'booz']

In [5]: df2.columns = [x if str(x).isalpha() else names.pop() for x in df2.columns]

In [6]: df2.columns                                              
Out[6]: Index([u'CustomerName', u'Seatblocks', u'booz', u'baz', u'bar', u'foo'], dtype='object')

In [7]: 

Upvotes: 0

DSM
DSM

Reputation: 353059

People have already given the rename approach, but I find these things are easier if you avoid the temptation to cram everything into one line. Once you have a frame, you can simply assign to .columns:

>>> sb = df.Seatblocks.str.split(":").apply(pd.Series)
>>> sb.columns = ["a", "Three Digit", "??", "coord"]
>>> pd.concat([df, sb], axis=1)
  CustomerName    Seatblocks  a Three Digit  ?? coord
0         Paul  2:218:10:4,6  2         218  10   4,6
1         John  2:218:10:4,6  2         218  10   4,6

The first line is simply a version of your (df.Seatblocks.apply(lambda x: Series(x.split(':')))) which takes advantage of the vectorized string operation accessor .str (docs).

Upvotes: 3

EdChum
EdChum

Reputation: 394021

Just rename them:

df.rename(columns={0:'col_1', 1:'col_2', 2:'col_3', 3:'col_4'},inplace=True)

A more obscure method is to form a union of the new names to the first 2 elements of your columns and assign directly:

In [14]:

df.columns = df.columns[:2] | pd.Index(['col_1', 'col_2', 'col_3', 'col_4'])
df
Out[14]:
  CustomerName    Seatblocks col_1 col_2 col_3 col_4
0         Paul  2:218:10:4,6     2   218    10   4,6
1         John  2:218:10:4,6     2   218    10   4,6

Upvotes: 3

Andrew
Andrew

Reputation: 3871

Columns were renamed to A, B, C, D.

from pandas import *
df = DataFrame(
     {'CustomerName' : ["Paul", "John"], 'Seatblocks' : ["2:218:10:4,6","2:218:10:4,6"]}); 
df = df.join(df.Seatblocks.apply(lambda x: Series(x.split(':'))))
df.rename(columns={0: 'A', 1: 'B', 2: 'C', 3: 'D'}, inplace=True)
df

Upvotes: 1

Related Questions