pooja
pooja

Reputation: 39

How to convert CSV file which having both comma and space delimiter to csv with only space delimiter

I am trying to convert the last column containing 2 values in the comma-separated form to 2 separate columns. Please see the last columns of input and output file to understand the motive.

Below is how my input file looks like:

fILENAME sent_no    word POS lab,Slab
File_1   sentence:1  abc NNP B,NO   
                     fhj PSP O,O    
                     bmm NNP B,NO   
                     vbn PSP O,O    
                     vbn NN  B,NO   
                     vbn NNPC B,NO  
                     .  Sym O,O 
File_1   Sentence:2 vbb NNP B,NO    
                    bbn PSP B,NO    
                    nnm NNP O,O 
                    nnn PSP B,NO    
                    bbn NN  O,O 
                    .   Sym O,O 

and output the output file I expect is as below:

Filename sent_num word POS Label Slab
 File_1 sentence:1 abc NNP B     NO
                   fhj PSP O      O
                   bmm NNP B     NO
                   vbn PSP O      O
                   vbn NN B      NO
                   vbn NNPC B    NO
                   .   Sym O      O
 File_1 Sentence:2 vbb NNP B     NO
                   bbn PSP B     NO
                   nnm NNP O      O
                   nnn PSP B     NO
                   bbn NN  O      O
                   .   Sym O      O

Upvotes: 1

Views: 3815

Answers (3)

Wahyu Hadinoto
Wahyu Hadinoto

Reputation: 208

try this :

import pandas
df = pandas.read_csv('try.csv',sep=';')
df[['Label','Slabel']]=df['Label,Slabel'].str.split(',',expand=True)
df.drop(['Label,Slabel'],axis=1,inplace=True)
df.to_csv('try2.csv',sep=';')

but i see your data using multiindex dataframe, so I add this:

df.set_index(['Filename','Sentence_num'],inplace=True)

and the result :

>>> df
                       Word  POS Label Slabel
Filename Sentence_num                        
File_1   sentence:1     abc  NNP     B     NO
         sentence:1     fhj  PSP     O      O
         sentence:1     bmm  NNP     B     NO
         sentence:1     vbn  PSS     O      O
File_2   sentence:2     vbb  NNP     B     NO
         sentence:2     bbn  PSP     B     NO
         sentence:2     nnm  NNP     O      O
         sentence:2    nnnm  PSP     B     NO
>>> 

in simple way, you can just using multi separator like this:

import pandas as pd
df = pandas.read_csv('try.csv',sep=' |,', engine='python') # separator space and comma

Upvotes: 3

JIASI
JIASI

Reputation: 59

I assume the *.csv file is

word POS lab,Slab
abc NNP B,NO
fhj PSP O,O
bmm NNP B,NO
vbn PSP O,O
vbn NN B,NO
vbn NNPC B,NO
vbb NNP B,NO
bbn PSP B,NO
nnm NNP O,O
nnn PSP B,NO
bbn NN O,O
. Sym O,O

You can use csv to read and write a specific delimiter csv file.

import csv
with open(path, newline='') as csvf:
    rows = csv.reader(csvf, delimiter=' ')
    with open(new_path, 'w', newline='') as new_csvf:
        writer = csv.writer(new_csvf, delimiter=' ')
        for row in rows:
            slab = row[-1].split(',')[-1]
            row.append(slab)
            writer.writerow(row)

Upvotes: 1

nocibambi
nocibambi

Reputation: 2421

You can use pandas to separate the 'comma-separated' column into two columns.

Here is an example dataframe

import pandas as pd
df = pd.DataFrame([['a,b'], ['c,d']], columns=['Label,Slabel'])

It looks like this

    Label,Slabel
0   a,b
1   c,d

Then you can convert the values into a list and then into a Series.

df['Label,Slabel'].str.split(',').apply(pd.Series)

The result

    0   1
0   a   b
1   c   d

Upvotes: 2

Related Questions