OGTW
OGTW

Reputation: 21

Plotting data from two columns in a .txt file (Python)

I'm new to Python on OSX and need to plot data from two columns within a .txt file. On windows I used the 'x[:,0]' function to denote columns though this seems to not work on Mac. I have tried the following:

f = open(os.path.expanduser("~/Desktop/a.txt.rtf"))

lines=f.readlines()

result=[]

for x in lines:
    result.append(x.split(' ')[0])

for y in lines:
    result.append(y.split(' ')[1]) 

f.close()

plt.plot(x,y)
plt.show()

But it says that the list index is out of range, even though the test file just reads:

1  2
3  4
5  6
7  8

How can that be? Please help!

After solving this I need to know the Mac alternative to the "skip_header =" function (as the file I want to use has the data I need starting 25 rows down...)

Thanks in advance, and sorry if these are easy queries but I just can't make it work :(

Upvotes: 2

Views: 5254

Answers (1)

englealuze
englealuze

Reputation: 1663

This is not a easy question at all. It is a very good question and many people face the same problem in their daily work. Your question will help others as well!

The error is because you are trying to read a so called Rich Text Format file (RTF). So, the real content of the file is not like what you see on screen, but coded strings.

Instead of

['1  2', '3  4',...]

f.readline() actually generate something like

['{\\rtf1\\adeflang1025\\ansi\\ansicpg1252\\uc1\\adeff31507\\deff0\\stshfdbch31505\\stshfloch31506\\stshfhich31506\\stshfbi31507\\...]

Therefore, when you try to index the splited line, you get index out of range error.

3 ideas to solve this problem. First you may consider to convert the RTF to plain text and read the text file with readline() as what you did. Or, you can read the RTF with some third party parser. Or, you can parse the RTF yourself with regular expression. Here are some useful links

convert RTF

parse RTF

Hope it is helpful.

Update

Though it is not very clear what you want to plot exactly, I guess what you really want is a scatter plot regarding the 1st and 2nd column in your data file. If that is true, you may need to modify a bit your code. Below is an example.

Assume your a.txt file (not rtf) has content

1  2
3  4
5  6
7  8

You can do this to plot a x y scatter plot with the 1st column as x 2nd column as y.

import matplotlib.pyplot as plt
f = open(os.path.expanduser("a.txt"))
lines = f.readlines()

x, y = [], []

for line in lines:
    x.append(line.split()[0])
    y.append(line.split()[1])

f.close()

print(x, y)

plt.plot(x,y)
plt.show()

Or with one-liner

f = open(os.path.expanduser("a.txt"))
lines = f.readlines()

x, y = zip(*(line.split() for line in lines))

f.close()

print(x, y)

plt.plot(x,y)
plt.show()

Upvotes: 3

Related Questions