Python / Read and group data from text file with Pandas

Question

I have a text file as follows:

Sentence:1 Polarity:N 5puan verdim o da anistonun güzel yüzünün hatırına.
Sentence:2 Polarity:N son derece sıkıcı bir filim olduğunu söyleyebilirim.
Sentence:3 Polarity:N ..saçma bir konuyu nasılda filim yapmışlar maşallah
Sentence:4 Polarity:P bence hoş vakit geçirmek için seyredilebilir.
Sentence:5 Polarity:P hoş ve sevimli bir film.
Sentence:6 Polarity:O eşcinsellere pek sempati duymamakla beraber bu filmde sanki onları sevimli göstermeye çalışmışlar gibi geldi.
Sentence:7 Polarity:O itici bir film değildi sonuçta.
Sentence:8 Polarity:N seyrederken bu kadar sinirlendiğim film hatırlamıyorum.
Sentence:9 Polarity:O  J.Aniston ın hiç mi umut yok diye sorduğu sahnede kıracaktım televizyonu!
Sentence:10 Polarity:O kimse yazmamış ben yazıyım:)
Sentence:11 Polarity:P  güzel bi pazar günü şirin bi film izlemek isteyenler için çok güzel.

I want to split this data in to a table like this:

Sentence_No - Sentence_Polarity - Sentence_txt
1 - N - 5puan verdim o da anistonun güzel yüzünün hatırına.
2 - N - son derece sıkıcı bir filim olduğunu söyleyebilirim.
3 - N - ..saçma bir konuyu nasılda filim yapmışlar maşallah
4 - P - bence hoş vakit geçirmek için seyredilebilir.

So I think I need to get the part from after "Sentence:", "Polarity" and the last txt part. I want it this way so I can classify the data.

I wrote the code below but it is not working for this purpose:

df = pd.read_csv('SU-Movie-Reviews-Sentences.txt', lineterminator='
', names=['Sentence_No', 'Sentence_Polarity' , 'Sentence_txt'])

Karn Kumar · Accepted Answer

Using DataFrame's replace method with regex , and use header=None while reading your file with read_csv as by default your first line of dataset will be considered as header and you will not able to get the First line. So, use fillna("0") as your number sequence is not consistent and having empty or Nan:

df = pd.read_csv("SU-Movie-Reviews-Sentences.txt", header=None).fillna("0")

print(df)
                                                   0
0   Sentence:1 Polarity:N 5puan verdim o da anisto...
1   Sentence:2 Polarity:N son derece sıkıcı bir fi...
2   Sentence:3 Polarity:N ..saçma bir konuyu nasıl...
3   Sentence:4 Polarity:P bence hoş vakit geçirmek...
4      Sentence:5 Polarity:P hoş ve sevimli bir film.
5   Sentence:6 Polarity:O eşcinsellere pek sempati...
6   Sentence:7 Polarity:O itici bir film değildi s...
7   Sentence:8 Polarity:N seyrederken bu kadar sin...
8   Sentence:9 Polarity:O  J.Aniston ın hiç mi umu...
9   Sentence:10 Polarity:O kimse yazmamış ben yazı...
10  Sentence:11 Polarity:P  güzel bi pazar günü şi...

Below is How you will use replace :

>>> df.replace('Sentence:|Polarity:', '',regex=True)
                                                    0
0   1 N 5puan verdim o da anistonun güzel yüzünün ...
1   2 N son derece sıkıcı bir filim olduğunu söyle...
2   3 N ..saçma bir konuyu nasılda filim yapmışlar...
3   4 P bence hoş vakit geçirmek için seyredilebilir.
4                        5 P hoş ve sevimli bir film.
5   6 O eşcinsellere pek sempati duymamakla berabe...
6                 7 O itici bir film değildi sonuçta.
7   8 N seyrederken bu kadar sinirlendiğim film ha...
8   9 O  J.Aniston ın hiç mi umut yok diye sorduğu...
9                   10 O kimse yazmamış ben yazıyım:)
10  11 P  güzel bi pazar günü şirin bi film izleme...

Python / Read and group data from text file with Pandas

Answers (2)

Related Questions