Ahmad
Ahmad

Reputation: 413

How to read a comma separated `csv` file into pandas dataframe if it contains a comma separated elements in a column?

I have the following test2.csv file:

id,name,city,country
1,David, Johnson,London,UK,
2,Mary, Gueta,NY,USA,
3,Matthias, Nile,Munich,Germany,

I want to read it into a pandas dataframe. Using this code

df = pd.read_csv('test2.csv')

I get the following df:

enter image description here

But I want to store the first name and last name together in the column name. id column should store the numbers. city and country should also store the appropriate values.

Thank you in advance!

Upvotes: 1

Views: 1707

Answers (4)

addictlinux
addictlinux

Reputation: 11

step1: combined with 'id' & 'name' columns named a new column 'full_name'

df['full_name'] = df['id'].map(str) + '.' + df['name'].map(str)

step2: sorted the 'id' column

df.loc[:,'id'] = list(range(df.shape[0]))

step3: delete 'name' column

del df['name']

Upvotes: 1

Tahmid Faisal
Tahmid Faisal

Reputation: 113

You can change 1,David, Johnson,London,UK to 1,"David, Johnson",London,UK then load it using pd.DataFrame.

This happens because pandas think it is another value and to avoid that you need use quotations.

If you have line breaks in your value you will also need to use quotations.

you can find basic rules of csv from here

Upvotes: 0

Pepe N O
Pepe N O

Reputation: 2344

There is a useful pattern to distinguish between values, space appears after comma in names, so you can substitute delimiter "," with "|" that is more efficient to avoid this type of confusion

import csv
import pandas as pd
import re


#substitute comma not followed by space for vertical bar in the original file and save it as a new file
with open('test2.csv', 'r') as infile, open('test2_vb.csv', 'w') as outfile:
    for line in infile:
        line = re.sub(',(?! )', '|', line)
        outfile.write(line)

file = open('test2_vb.csv', 'r')
reader = csv.reader(file , delimiter='|')
#load the data into a dataframe
df = pd.DataFrame(reader)
print(df) 

this way using a regular expression the file can be corrected (though at the end of a line should not be separator)

Upvotes: 0

Alex
Alex

Reputation: 139

import csv
file = open(`temp.csv')
csvreader = csv.reader(file)
header = []
header = next(csvreader)
new_rows = []
for row in csvreader:
    new_rows.append([row[0], row[1]+','+row[2], row[3], row[4]])
file.close()
df = pd.DataFrame(new_rows, columns =header )
df

Upvotes: 1

Related Questions