Reputation: 39
I am trying to convert the last column containing 2 values in the comma-separated form to 2 separate columns. Please see the last columns of input and output file to understand the motive.
Below is how my input file looks like:
fILENAME sent_no word POS lab,Slab
File_1 sentence:1 abc NNP B,NO
fhj PSP O,O
bmm NNP B,NO
vbn PSP O,O
vbn NN B,NO
vbn NNPC B,NO
. Sym O,O
File_1 Sentence:2 vbb NNP B,NO
bbn PSP B,NO
nnm NNP O,O
nnn PSP B,NO
bbn NN O,O
. Sym O,O
and output the output file I expect is as below:
Filename sent_num word POS Label Slab
File_1 sentence:1 abc NNP B NO
fhj PSP O O
bmm NNP B NO
vbn PSP O O
vbn NN B NO
vbn NNPC B NO
. Sym O O
File_1 Sentence:2 vbb NNP B NO
bbn PSP B NO
nnm NNP O O
nnn PSP B NO
bbn NN O O
. Sym O O
Upvotes: 1
Views: 3815
Reputation: 208
try this :
import pandas
df = pandas.read_csv('try.csv',sep=';')
df[['Label','Slabel']]=df['Label,Slabel'].str.split(',',expand=True)
df.drop(['Label,Slabel'],axis=1,inplace=True)
df.to_csv('try2.csv',sep=';')
but i see your data using multiindex dataframe, so I add this:
df.set_index(['Filename','Sentence_num'],inplace=True)
and the result :
>>> df
Word POS Label Slabel
Filename Sentence_num
File_1 sentence:1 abc NNP B NO
sentence:1 fhj PSP O O
sentence:1 bmm NNP B NO
sentence:1 vbn PSS O O
File_2 sentence:2 vbb NNP B NO
sentence:2 bbn PSP B NO
sentence:2 nnm NNP O O
sentence:2 nnnm PSP B NO
>>>
in simple way, you can just using multi separator like this:
import pandas as pd
df = pandas.read_csv('try.csv',sep=' |,', engine='python') # separator space and comma
Upvotes: 3
Reputation: 59
I assume the *.csv file is
word POS lab,Slab
abc NNP B,NO
fhj PSP O,O
bmm NNP B,NO
vbn PSP O,O
vbn NN B,NO
vbn NNPC B,NO
vbb NNP B,NO
bbn PSP B,NO
nnm NNP O,O
nnn PSP B,NO
bbn NN O,O
. Sym O,O
You can use csv to read and write a specific delimiter csv file.
import csv
with open(path, newline='') as csvf:
rows = csv.reader(csvf, delimiter=' ')
with open(new_path, 'w', newline='') as new_csvf:
writer = csv.writer(new_csvf, delimiter=' ')
for row in rows:
slab = row[-1].split(',')[-1]
row.append(slab)
writer.writerow(row)
Upvotes: 1
Reputation: 2421
You can use pandas to separate the 'comma-separated' column into two columns.
Here is an example dataframe
import pandas as pd
df = pd.DataFrame([['a,b'], ['c,d']], columns=['Label,Slabel'])
It looks like this
Label,Slabel
0 a,b
1 c,d
Then you can convert the values into a list and then into a Series.
df['Label,Slabel'].str.split(',').apply(pd.Series)
The result
0 1
0 a b
1 c d
Upvotes: 2