legends1337
legends1337

Reputation: 111

Parsing a tab-delimited .txt into a Pandas DataFrame

I have a tab delimited .txt file that I'm trying to import into a dataframe in Python of the same format as the text file is as shown below:

ham TAB Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...

spam TAB Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's

...

Note there are many, many more rows of the stuff above (roughly 5500) that I want to pass into Python and maintain the same formatting when creating a matrix array from it.

The current code that I have for this is:

 import pandas as pd 

 with open("SMSSpamCollection.txt") as f:
      reader = csv.reader(f, delimiter = "\t")
      d = list(reader)
 d = pd.DataFrame(reader)

Which it slightly does what I need it to do, but I want a DataFrame with 2 columns: Y (containing ham or spam) and a second X (containing the message). At this time I get a [5572,2] DataFrame.

Upvotes: 2

Views: 11093

Answers (1)

Błotosmętek
Błotosmętek

Reputation: 12927

How about this:

import pandas as pd 
d = pd.read_csv("SMSSpamCollection.txt", sep="\t", names=['Y','X'])

Upvotes: 7

Related Questions