Wahyudi
Wahyudi

Reputation: 39

how to manipulate data from text to CSV

I have text file data.txt like this:

Name male  female
bayu 1 0
jonson 1 0
anna 0 1

i have try to convert this file using pandas python

import pandas as pd
df = pd.read_fwf('data.txt')
df.to_csv('data.csv')

I want to get results data.csv like this:

Gender name
bayu male
jonson male
Anna  Female

what should i do?

Upvotes: 1

Views: 63

Answers (3)

Karn Kumar
Karn Kumar

Reputation: 8826

Simply can be done with replace and rename ..

DataFrame:

>>> df
    Name  female  male
0   bayu       0     1
1  jason       0     1
2   anna       1     0

Result:

>>> df.replace({ 0: "male", 1: "Female"}).drop(columns={'male'}).rename(columns={'female': 'Gender'})
    Name  Gender
0   bayu    male
1  jason    male
2   anna  Female

saving dataframe to csv..

>>> df = df.replace({ 0: "male", 1: "Female"}).drop(columns {'male'}).rename(columns={'female': 'Gender'})
>>> df.to_csv('data.csv', index=False)

Your CSV will be looks like below with comma delimited ..

$ cat data.csv
Name,Gender
bayu,male
jason,male
anna,Female

Or, Just with idxmax and df.assign to assign back the values to nw columns altogether.

>>> df.assign(Gender = df[['male','female']].idxmax(axis=1)).drop(columns={'male', 'female'})
    Name  Gender
0   bayu    male
1  jason    male
2   anna  female

2) Another way around creating a function and then using apply method.

 def setval(col):
   col = str(col)
   if col.startswith('0'):
     return  'Male'
   if col.startswith('1'):
     return  'Female'


>>> df.assign(Gender = df[ 'female'].apply(setval)).drop({"female", "male"}, axis=1)
    Name  Gender
0   bayu    Male
1  jason    Male
2   anna  Female

3) Just for the sake of posterity ..

You can create a dict and then map the dict against one of the column assign the new values to a new column with df.assignfurther dropping the un-desired columns.

Here, yo can use either map or replace method at your ease ..

>>> newVal = {0:'Male', 1: 'Female'}   # dictionary paired with key, values

a) in the below example axis=1 represents the column which is newer method.

>>> df.assign(Gender=df.female.map(newVal)).drop({'male', 'female'},axis=1)
    Name  Gender
0   bayu    Male
1  jason    Male
2   anna  Female

b) while in the below example we are explicitly mentioning it to be applied on the column which is old method.

>>> df.assign(Gender=df.female.map(newVal)).drop(columns={'male', 'female'})
    Name  Gender
0   bayu    Male
1  jason    Male
2   anna  Female

With .replace

>>> df.assign(Gender=df.female.replace(newVal)).drop({'male', 'female'},axis=1)
    Name  Gender
0   bayu    Male
1  jason    Male
2   anna  Female

Upvotes: 1

Joe
Joe

Reputation: 12417

You can use idxmax() before to save to csv:

df['Gender'] = df[['male', 'female']].idxmax(axis=1)
df = df.drop(columns=['male', 'female'])

Upvotes: 1

Hryhorii Pavlenko
Hryhorii Pavlenko

Reputation: 3910

import numpy as np
df['Gender'] = np.where(df['male'].eq(1), 'male', 'female')
df = df.drop(columns={'male', 'female'})
df.to_csv('data.csv', index=False)
    Name    Gender
0   bayu    male
1   jonson  male
2   anna    female

Upvotes: 1

Related Questions