Reputation: 417
I have one 'csv' file it looks like this:
sample data :
Name : Jai
Age : 25
Address: N P IV
Country:
Name : Jack
Age : 18
Address: T U W IX
Country: USA
I want to split this single column into multiple, just like this, Expected result:
Name Age Address Country
Jai 25 N P IV NA
Jack 18 T U W IX USA
Thank you
Upvotes: 2
Views: 1611
Reputation: 765
df1.assign(col3=df1.col1.str.strip().eq("Name").cumsum()).pivot(index="col3",columns="col1",values="col2")
:
Name Age Address Country
Jai 25 N P IV Canada
Jack 18 T U W IX USA
Upvotes: 0
Reputation: 2404
If you format the data as JSON, you could iterate though the object and print out all the dictionary items onto one line. The only problem is getting the spacing right with the text your printing out; line up into nice consistent columns.
data = [
{'Name':'Jai',
'Age':25,
'Address': 'N P IV',
'Country': 'Canada'},
{'Name' : 'Jack',
'Age' : 18,
'Address': 'T U W IX',
'Country': 'USA'}
]
print("Name\tAge\tAddress\tCountry")
out=""
for adres in data:
for x, y in adres.items():
out=out+str(y)+"\t"
print(out)
out=""
Script out put:
Name Age Address Country
Jai 25 N P IV Canada
Jack 18 T U W IX USA
Upvotes: 0
Reputation: 862511
First create 2 columns DataFrame
by read_csv
with separator :\s+
for :
with one or more spaces, then convert second column to numpy array and reshape for 4 'columns'
, create DataFrame by constructor with first 4 values of first column to new columns names and last if necessary convert Age
column to integers:
import pandas as pd
import numpy as np
temp=u"""Name : Jai
Age : 25
Address: N P IV
Country:
Name : Jack
Age : 18
Address: T U W IX
Country: USA"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=":\s+", names=['col1','col2'])
print (df)
col1 col2
0 Name Jai
1 Age 25
2 Address N P IV
3 Country: None
4 Name Jack
5 Age 18
6 Address T U W IX
7 Country USA
c = df['col1'].iloc[:4].str.strip(' :')
#pandas 0.24+
df = pd.DataFrame(df['col2'].to_numpy().reshape(-1, 4), columns=c).rename_axis(None, axis=1)
#pandas below 0.24
#df = pd.DataFrame(df['col2'].values.reshape(-1, 4), columns=c).rename_axis(None, axis=1)
df['Age'] = df['Age'].astype(int)
print (df)
Name Age Address Country
0 Jai 25 N P IV None
1 Jack 18 T U W IX USA
Upvotes: 2