thetinywindow
thetinywindow

Reputation: 71

Import UTF-8 text file (& input in data frame)

Here is an example output for the input txt file.

PT AU BA CA GP RI J Garcia-Perez, Guillermo; Rossi, Matteo A. C.; Maniscalco, Sabrina Rossi, Matteo/E-4964-2015 Rossi, Matteo/0000-0003-4665-9284; Garcia-Perez, Guillermo/0000-0002-9006-060X IBM Q Experience as a versatile experimental testbed for simulating open quantum systems NPJ QUANTUM INFORMATION 6 1 1 10.1038/s41534-019-0235-y DEC 2020

Currently I use the following code:

df = pd.read_fwf('savedrecs-2.txt')
df.head()

However, the results are not split according to the columns provided in the utf-8 text file.

Current Output:

0
0   PT\tAU\tBA\tCA\tGP\tRI\tOI\tBE\tZ2\tTI\tX1\tY...
1   J\tGarcia-Perez, Guillermo; Rossi, Matteo A. C...
2   J\tScholes, Colin A.; Kentish, Sandra E.; Qade...
3   J\tVillain-Gambier, M.; Courbalay, M.; Klem, A...
4   J\tShahmahdi, Najmeh; Dehghanzadeh, Reza; Asla...

Expected Output (example)

PT            AU    BA  CA  GP  RI
Garcia-Perez  xy    xy  xy  xy  xy
Guillermo     xy    xy  xy  xy  xy

Upvotes: 0

Views: 158

Answers (1)

thetinywindow
thetinywindow

Reputation: 71

The following code appears to return the expected result.

filename = 'savedrecs-8.txt'

doc = codecs.open(filename,'rU','UTF-8')
df = pd.read_csv(doc, sep='\t')

Upvotes: 1

Related Questions