Reputation: 81
I have a dataframe containing a number of columns and rows, in all of the columns except for the leftmost two, there is data of the form "integer-integer". I would like to split all of these columns into two columns, with each integer in its own cell, and remove the dash.
I have tried to follow the answers in Pandas Dataframe: Split multiple columns each into two columns, but it seems that they are splitting after one element, while I would like to split on the "-".
By way of example, suppose I have a dataframe of the form:
I would like to split the columns labelled 2 through to 22, to have them called 2F, 2A, 3F, 3A, ..., 6A with the data in the first row being R1, Hawthorn, 229, 225, 91, 81, ..., 12.
Thank you for any help.
Upvotes: 3
Views: 80
Reputation: 501
you can use lambda function for split a series
import pandas as pd
df = pd.read_csv("data.csv")
df.head()
>>> data
0 12-24
1 13-26
2 14-28
3 15-30
df["d1"] = df["data"].apply(lambda x: x.split("-")[0])
df["d2"] = df["data"].apply(lambda x: x.split("-")[1])
df.head()
>>>
data d1 d2
0 12-24 12 24
1 13-26 13 26
2 14-28 14 28
3 15-30 15 30
Upvotes: 0
Reputation: 1614
For Input:
df = pd.DataFrame({0: ['R1'], 1: ['Hawthorn'], 2: ['229-225'], 3: ['91-81'], 4:['210-142'], 5:['439-367'], 6:['7-12']})
0 1 2 3 4 5 6
0 R1 Hawthorn 229-225 91-81 210-142 439-367 7-12
Trying the code:
for i in df.columns[2::]:
df[[str(i)+'F', str(i)+'A']] =pd.DataFrame(df[i].str.split('-').tolist(), index= df.index)
del df[i]
Prints (1st row):
0 1 2F 2A 3F 3A 4F 4A 5F 5A 6F 6A
0 R1 Hawthorn 229 225 91 81 210 142 439 367 7 12
Upvotes: 0
Reputation: 862481
You can use DataFrame.set_index
with DataFrame.stack
for Series
, then split to new 2 columns by Series.str.split
, convert to integers, create new columns names by DataFrame.set_axis
, reshape by DataFrame.unstack
, sorting columns by DataFrame.sort_index
and last flatten MultiIndex
with convert index to columns by DataFrame.reset_index
:
#first replace columns names to default values
df.columns = range(len(df.columns))
df = (df.set_index([0,1])
.stack()
.str.split('-', expand=True)
.astype(int)
.set_axis(['F','A'], axis=1, inplace=False)
.unstack()
.sort_index(axis=1, level=[1,0], ascending=[True, False]))
df.columns = df.columns.map(lambda x: f'{x[1]}{x[0]}')
df = df.reset_index()
print (df)
0 1 2F 2A 3F 3A 4F 4A 5F 5A 6F 6A
0 R1 Hawthorn 229 225 91 81 216 142 439 367 7 12
1 R2 Sydney 226 214 93 92 151 167 377 381 12 8
2 R3 Geelong 216 228 91 166 159 121 369 349 16 14
3 R4 North Melbourne 213 239 169 126 142 155 355 394 8 9
4 R5 Gold Coast 248 226 166 94 267 169 455 389 18 6
5 R6 St Kilda 242 197 118 161 158 156 466 353 15 16
6 R7 Fremantle 225 219 72 84 224 185 449 464 7 5
Upvotes: 2