TNoms
TNoms

Reputation: 81

Pandas Dataframe

I have a dataframe containing a number of columns and rows, in all of the columns except for the leftmost two, there is data of the form "integer-integer". I would like to split all of these columns into two columns, with each integer in its own cell, and remove the dash.

I have tried to follow the answers in Pandas Dataframe: Split multiple columns each into two columns, but it seems that they are splitting after one element, while I would like to split on the "-".

By way of example, suppose I have a dataframe of the form:

enter image description here

I would like to split the columns labelled 2 through to 22, to have them called 2F, 2A, 3F, 3A, ..., 6A with the data in the first row being R1, Hawthorn, 229, 225, 91, 81, ..., 12.

Thank you for any help.

Upvotes: 3

Views: 80

Answers (3)

Deepak Raj
Deepak Raj

Reputation: 501

you can use lambda function for split a series

import pandas as pd

df = pd.read_csv("data.csv")
df.head()
>>> data
0  12-24
1  13-26
2  14-28
3  15-30
df["d1"] = df["data"].apply(lambda x: x.split("-")[0])
df["d2"] = df["data"].apply(lambda x: x.split("-")[1])
df.head()
>>>
    data  d1  d2
0  12-24  12  24
1  13-26  13  26
2  14-28  14  28
3  15-30  15  30

Upvotes: 0

sharathnatraj
sharathnatraj

Reputation: 1614

For Input:

df = pd.DataFrame({0: ['R1'], 1: ['Hawthorn'], 2: ['229-225'],  3: ['91-81'], 4:['210-142'], 5:['439-367'], 6:['7-12']})

    0         1        2      3        4        5     6
0  R1  Hawthorn  229-225  91-81  210-142  439-367  7-12

Trying the code:

for i in df.columns[2::]:
    df[[str(i)+'F', str(i)+'A']] =pd.DataFrame(df[i].str.split('-').tolist(), index= df.index)
    del df[i] 

Prints (1st row):

    0         1   2F   2A  3F  3A   4F   4A   5F   5A 6F  6A
0  R1  Hawthorn  229  225  91  81  210  142  439  367  7  12

Upvotes: 0

jezrael
jezrael

Reputation: 862481

You can use DataFrame.set_index with DataFrame.stack for Series, then split to new 2 columns by Series.str.split, convert to integers, create new columns names by DataFrame.set_axis, reshape by DataFrame.unstack, sorting columns by DataFrame.sort_index and last flatten MultiIndexwith convert index to columns by DataFrame.reset_index:

#first replace columns names to default values
df.columns = range(len(df.columns))

df = (df.set_index([0,1])
        .stack()
        .str.split('-', expand=True)
        .astype(int)
        .set_axis(['F','A'], axis=1, inplace=False)
        .unstack()
        .sort_index(axis=1, level=[1,0], ascending=[True, False]))
df.columns = df.columns.map(lambda x: f'{x[1]}{x[0]}')
df = df.reset_index()
print (df)
    0                1   2F   2A   3F   3A   4F   4A   5F   5A  6F  6A
0  R1         Hawthorn  229  225   91   81  216  142  439  367   7  12
1  R2           Sydney  226  214   93   92  151  167  377  381  12   8
2  R3          Geelong  216  228   91  166  159  121  369  349  16  14
3  R4  North Melbourne  213  239  169  126  142  155  355  394   8   9
4  R5       Gold Coast  248  226  166   94  267  169  455  389  18   6
5  R6         St Kilda  242  197  118  161  158  156  466  353  15  16
6  R7        Fremantle  225  219   72   84  224  185  449  464   7   5

Upvotes: 2

Related Questions