Reputation: 111
I have a tab delimited .txt file that I'm trying to import into a dataframe in Python of the same format as the text file is as shown below:
ham TAB Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...
spam TAB Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's
...
Note there are many, many more rows of the stuff above (roughly 5500) that I want to pass into Python and maintain the same formatting when creating a matrix array from it.
The current code that I have for this is:
import pandas as pd
with open("SMSSpamCollection.txt") as f:
reader = csv.reader(f, delimiter = "\t")
d = list(reader)
d = pd.DataFrame(reader)
Which it slightly does what I need it to do, but I want a DataFrame with 2 columns: Y (containing ham or spam) and a second X (containing the message). At this time I get a [5572,2] DataFrame.
Upvotes: 2
Views: 11093
Reputation: 12927
How about this:
import pandas as pd
d = pd.read_csv("SMSSpamCollection.txt", sep="\t", names=['Y','X'])
Upvotes: 7