Reputation: 109
I have a .txt file that has the data regarding the total number of queries with valid names. The text inside of the file came out of a SQL Server 19 query output. The database used consists of the results of an algorithm that retrieves the most similar brands related to the query inserted. The file looks something like this:
2 16, 42, 44 A MINHA SAÚDE
3 34 !D D DUNHILL
4 33 #MEGA
5 09 (michelin man)
5 12 (michelin man)
6 33 *MONTE DA PEDRA*
7 35 .FOX
8 33 @BATISTA'S BY PITADA VERDE
9 12 @COM
10 41 + NATUREZA HUMANA
11 12 001
12 12 002
13 12 1007
14 12 101
15 12 102
16 12 104
17 37 112 PC
18 33 1128
19 41 123 PILATES
The 1st column has the Query identifier, the 2nd one has the brand classes where the Query can be located and the 3rd one is the Query itself (the spaces came from the SQL Server output formatting).
I then made a Pandas DataFrame in Google Colaboratory where I wanted the columns to be like the ones in the text file. However, when I ran the code, it gave me this:
The code that I wrote is here:
# Dataframe with the total number of queries with valid names:
df = pd.DataFrame(pd.read_table("/content/drive/MyDrive/data/classes/100/queries100.txt", header=None, names=["Query ID", "Query Name", "Classes Where Query is Present"]))
df
I think that this happens because of the commas in the 2nd column but I'm not quite sure. Any suggestions on why this is happening? I already tried read_csv
and read_fwf
and they were even worse in terms of formatting.
Upvotes: 0
Views: 75
Reputation: 3011
You can use pd.read_fwf()
in this case, as your columns have fixed widths:
import pandas as pd
df = pd.read_fwf(
"/content/drive/MyDrive/data/classes/100/queries100.txt",
colspecs=[(0,20),(21,40),(40,1000)],
header=None,
names=["Query ID", "Query Name", "Classes Where Query is Present"]
)
df.head()
# Query ID Query Name Classes Where Query is Present
# 0 2 16, 42, 44 A MINHA SAÚDE
# 1 3 34 !D D DUNHILL
# 2 4 33 #MEGA
# 3 5 09 (michelin man)
# 4 5 12 (michelin man)
Upvotes: 1