Reputation: 33
I want to convert a data set of an .dat
file into csv
file. The data format looks like,
Each row begins with the sentiment score followed by the text associated with that rating.
I want the have sentiment value of (-1 or 1) to have a column and the text of review corresponding to the sentiment value to have an review to have an column.
WHAT I TRIED SO FAR
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import csv
# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("train.dat").readlines()]
# write it as a new CSV file
with open("train.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(datContent)
def your_func(row):
return row['Sentiments'] / row['Review']
columns_to_keep = ['Sentiments', 'Review']
dataframe = pd.read_csv("train.csv", usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)
print dataframe
Sample screen shot of the resulting train.csv it has an comma after every word in the review.
Upvotes: 2
Views: 6784
Reputation: 402972
If all your rows follow that consistent format, you can use pd.read_fwf
. This is a little safer than using read_csv
, in the event that your second column also contains the delimiter you are attempting to split on.
df = pd.read_fwf('data.txt', header=None,
widths=[2, int(1e5)], names=['label', 'text'])
print(df)
label text
0 -1 ieafxf rjzy xfxk ymi wuy
1 1 lqqm ceegjnbjpxnidygr
2 -1 zss awoj anxb rfw kgbvnl
data.txt
-1 ieafxf rjzy xfxk ymi wuy
+1 lqqm ceegjnbjpxnidygr
-1 zss awoj anxb rfw kgbvnl
Upvotes: 4
Reputation: 895
As mentioned in the comments, read_csv would be appropriate here.
df = pd.read_csv('train_csv.csv', sep='\t', names=['Sentiments', 'Review'])
Sentiments Review
0 -1 alskjdf
1 1 asdfa
2 1 afsd
3 -1 sdf
Upvotes: 0