Ranjeet
Ranjeet

Reputation: 382

Read csv file with regular expression delimeter

I have a csv file like this:

x,y,z, vec, s2
1,2,3,(1,2,3),5
3,4,3,(4,5,3),8

I want read this file, vec as (a,b,c). when reading with pd.read_csv(filename), it is reading differently.

Upvotes: 0

Views: 146

Answers (2)

Econundrums
Econundrums

Reputation: 317

I would do something like this

with open(r'myFile.csv', 'r') as file:
    data = file.read().split('\n')

cols = data[0].split(', ') # Note there's a whitespace after the comma
dat = [i.split(', ') for i in data[1:]]
df = pd.DataFrame(dat, columns = cols)

However, keep in mind that this method returns your values as strings in the dataframe. It should be easy to convert all the singular numbers to integers with the apply() function and int(), but the tricky part is converting those tuples to actual tuples. For that, do this...

from ast import literal_eval
df['vec'] = df['vec'].apply(literal_eval)

Upvotes: 0

watfe
watfe

Reputation: 107

Maybe you should load csv as a string, then split to list, finally transform list to dataframe.

with open('test.csv') as f:
    csv = f.read()+'\n'
import re
import pandas as pd
reArr = re.findall('([^,]+),([^,]+),([^,]+),(.*),([^,]+)\n',csv)
df=pd.DataFrame(reArr[1:],columns=reArr[0])
print(df)
x y z vec s2
0 1 2 3 (1,2,3) 5
1 3 4 3 (4,5,3) 8

Upvotes: 2

Related Questions