PEREZje
PEREZje

Reputation: 2492

Pandas Dataframe splice data into 2 columns and make a number with a comma and integer

I currently am running into two issues:

My data-frame looks like this:

, male_female, no_of_students
0, 24 : 76, "81,120"
1, 33 : 67, "12,270"
2, 50 : 50, "10,120"
3, 42 : 58, "5,120"
4, 12 : 88, "2,200"

What I would like to achieve is this:

, male, female, no_of_students
0, 24, 76, 81120
1, 33, 67, 12270
2, 50, 50, 10120
3, 42, 58, 5120
4, 12, 88, 2200

Basically I want to convert male_female into two columns and no_of_students into a column of integers. I tried a bunch of things, converting the no_of_students column into another type with .astype. But nothing seems to work properly, I also couldn't really find a smart way of splitting the male_female column properly.

Hopefully someone can help me out!

Upvotes: 3

Views: 660

Answers (1)

jezrael
jezrael

Reputation: 863226

Use str.split with pop for new columns by separator, then strip trailing values, replace and if necessary convert to integers:

df[['male','female']] = df.pop('male_female').str.split(' : ', expand=True)
df['no_of_students'] = df['no_of_students'].str.strip('" ').str.replace(',','').astype(int)
df = df[['male','female', 'no_of_students']]

print (df)
  male female  no_of_students
0   24     76           81120
1   33     67           12270
2   50     50           10120
3   42     58            5120
4   12     88            2200

Upvotes: 6

Related Questions