Reputation: 31
I have the following code:
datadicts = [ ]
with open("input.txt") as f:
for line in f:
datadicts.append({'col1': line[':'], 'col2': line[':'], 'col3': line[':'], 'col4': line[':']})
df = pd.DataFrame(datadicts)
df = df.drop([0])
print(df)
I am using a text file (that is not formatted) to pull chunks of data from. When the text file is opened, it looks something like this, except on a way bigger scale:
00 2381 1.3 3.4 1.8 265879 Name
34 7879 7.6 4.2 2.1 254789 Name
45 65824 2.3 3.4 1.8 265879 Name
58 3450 1.3 3.4 1.8 183713 Name
69 37495 1.3 3.4 1.8 137632 Name
73 458913 1.3 3.4 1.8 138024 Name
Here are the things I'm having trouble doing with this data:
Col1 Col2 Col3 Col4
2381 3.4 265879 Name
7879 4.2 254789 Name
65824 3.4 265879 Name
3450 3.4 183713 Name
37495 3.4 137632 Name
458913 3.4 138024 Name
Everything is right-aligned under the column and it looks strange. Any ideas how to solve this?
So, if my data is not analyzable using Python functions, does anyone know how I can fix this to make the data be able to run correctly?
Any help would be greatly appreciated. I hope I've laid out all of my needs clearly. I am new to Python, and I'm not sure if I'm using all the proper terminology.
Upvotes: 2
Views: 120
Reputation: 9721
You can use the pandas.read_csv()
function to accomplish this very easily.
txt2pd.txt
is a text file containing a copy/paste from your source abovesep
is using a regex pattern to delimit by one or more consecutive spacesnames
uses a list
to create your column namesskiprows
skips the first row, per your requirementskeep = ['col1', 'col3', 'col5', 'col6']
df = pd.read_csv('txt2pd.txt',
sep='\s+',
names=['col0', 'col1', 'col2', 'col3', 'col4', 'col5', 'col6'],
skiprows=1)
df = df[keep]
col1 col3 col5 col6
0 7879 4.2 254789 Name
1 65824 3.4 265879 Name
2 3450 3.4 183713 Name
3 37495 3.4 137632 Name
4 458913 3.4 138024 Name
Using df.describe()
you can output a simple, high-level analysis. (Anything further should be the subject of a new question.)
col1 col3 col5
count 5.000000 5.000000 5.000000
mean 114712.200000 3.560000 196007.400000
std 194048.545838 0.357771 61762.106621
min 3450.000000 3.400000 137632.000000
25% 7879.000000 3.400000 138024.000000
50% 37495.000000 3.400000 183713.000000
75% 65824.000000 3.400000 254789.000000
max 458913.000000 4.200000 265879.000000
Upvotes: 1