dev_user
dev_user

Reputation: 417

How to separate one column into multiple columns in python?

I have one 'csv' file it looks like this:

sample data :

    Name : Jai
    Age : 25
    Address: N P IV 
    Country: 
    Name : Jack
    Age : 18
    Address: T U W IX 
    Country: USA

I want to split this single column into multiple, just like this, Expected result:

    Name        Age        Address        Country
    Jai         25          N P IV         NA
    Jack        18          T U W IX       USA

Thank you

Upvotes: 2

Views: 1611

Answers (3)

G.G
G.G

Reputation: 765

df1.assign(col3=df1.col1.str.strip().eq("Name").cumsum()).pivot(index="col3",columns="col1",values="col2")

:


Name    Age Address  Country
Jai     25  N P IV   Canada 
Jack    18  T U W IX         USA   

Upvotes: 0

9716278
9716278

Reputation: 2404

If you format the data as JSON, you could iterate though the object and print out all the dictionary items onto one line. The only problem is getting the spacing right with the text your printing out; line up into nice consistent columns.

data = [
    {'Name':'Jai',
     'Age':25,
     'Address': 'N P IV',
     'Country': 'Canada'},
    {'Name' : 'Jack',
     'Age' : 18,
     'Address': 'T U W IX',
     'Country': 'USA'}
]

print("Name\tAge\tAddress\tCountry")

out=""

for adres in data:
    for x, y in adres.items():
        out=out+str(y)+"\t"
    print(out)
    out=""

Script out put:

Name    Age Address  Country
Jai     25  N P IV   Canada 
Jack    18  T U W IX         USA    

Upvotes: 0

jezrael
jezrael

Reputation: 862511

First create 2 columns DataFrame by read_csv with separator :\s+ for : with one or more spaces, then convert second column to numpy array and reshape for 4 'columns', create DataFrame by constructor with first 4 values of first column to new columns names and last if necessary convert Age column to integers:

import pandas as pd
import numpy as np

temp=u"""Name : Jai
Age : 25
Address: N P IV 
Country: 
Name : Jack
Age : 18
Address: T U W IX 
Country: USA"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=":\s+", names=['col1','col2'])

print (df)
       col1      col2
0     Name        Jai
1      Age         25
2   Address    N P IV
3  Country:      None
4     Name       Jack
5      Age         18
6   Address  T U W IX
7   Country       USA

c = df['col1'].iloc[:4].str.strip(' :')
#pandas 0.24+
df = pd.DataFrame(df['col2'].to_numpy().reshape(-1, 4), columns=c).rename_axis(None, axis=1)
#pandas below 0.24
#df = pd.DataFrame(df['col2'].values.reshape(-1, 4), columns=c).rename_axis(None, axis=1)

df['Age'] = df['Age'].astype(int)
print (df)
   Name  Age   Address Country
0   Jai   25    N P IV    None
1  Jack   18  T U W IX     USA

Upvotes: 2

Related Questions