KoushikProgrammer
KoushikProgrammer

Reputation: 33

convert .dat into .csv in python

I want to convert a data set of an .dat file into csv file. The data format looks like,

Each row begins with the sentiment score followed by the text associated with that rating.

Image of the .dat file

I want the have sentiment value of (-1 or 1) to have a column and the text of review corresponding to the sentiment value to have an review to have an column.

WHAT I TRIED SO FAR

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np  
import csv

# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("train.dat").readlines()]

# write it as a new CSV file
with open("train.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(datContent)
def your_func(row):
    return row['Sentiments'] / row['Review']

columns_to_keep = ['Sentiments', 'Review']
dataframe = pd.read_csv("train.csv", usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)

print dataframe

Sample screen shot of the resulting train.csv it has an comma after every word in the review.

Output of the train.csv

Upvotes: 2

Views: 6784

Answers (2)

cs95
cs95

Reputation: 402972

If all your rows follow that consistent format, you can use pd.read_fwf. This is a little safer than using read_csv, in the event that your second column also contains the delimiter you are attempting to split on.

df = pd.read_fwf('data.txt', header=None, 
        widths=[2, int(1e5)], names=['label', 'text'])

print(df)
   label                       text
0     -1  ieafxf  rjzy xfxk ymi wuy
1      1     lqqm  ceegjnbjpxnidygr
2     -1  zss awoj anxb rfw  kgbvnl

data.txt

-1  ieafxf  rjzy xfxk ymi wuy
+1  lqqm  ceegjnbjpxnidygr
-1  zss awoj anxb rfw  kgbvnl

Upvotes: 4

Evan Nowak
Evan Nowak

Reputation: 895

As mentioned in the comments, read_csv would be appropriate here.

df = pd.read_csv('train_csv.csv', sep='\t', names=['Sentiments', 'Review'])

  Sentiments     Review
0         -1    alskjdf
1          1      asdfa
2          1       afsd
3         -1        sdf

Upvotes: 0

Related Questions