Reputation: 693
I have multiple files (thousands of files) in a folder, I'm reading these files using some glob function. What I want to do is print the first column (text file doesn't have a header column) and store it in some dataframe as I need to make tables based on calculations across multiple files. Here is my data (Sample data of two files)
File1:
O.U20,99.73000,75538,99.73500,51794,57821,99.73167,1062,4,,,,99.73173
O.Z20,99.70000,58974,99.70500,6748,35815,99.70250,468,3,99.70500,1132,2,99.70048
O.H21,99.79500,4274,99.80000,47043,49961,,,,99.79750,3424,3,99.79236
O.M21,99.81000,48584,99.81500,7062,37456,99.81167,243,3,99.81500,234,2,99.80975
S3.U20,3.000,1132,3.500,69740,3831,,,,3.250,1380,2,3.125
S3.Z20,-9.500,58855,-9.000,27304,3295,-9.250,468,2,-9.000,3730,2,-9.188
File 2:
O.U20,99.73000,75711,99.73500,51794,57821,99.73167,1062,4,,,,99.73173
O.Z20,99.70000,59142,99.70500,6748,35815,99.70250,468,3,99.70500,1132,2,99.70048
O.H21,99.79500,4447,99.80000,47043,49961,,,,99.79750,3424,3,99.79236
O.M21,99.81000,48765,99.81500,7062,37456,99.81167,243,3,99.81500,234,2,99.80975
S3.U20,3.000,1132,3.500,69740,3831,,,,3.250,1380,2,3.125
S3.Z20,-9.500,58855,-9.000,27477,3295,-9.250,468,2,-9.000,3730,2,-9.188
This is my code I'm working on
import glob
for file in glob.glob("C:/Users/Data/*"):
print(file)
myfile = open(file,"r")
lines = myfile.readlines()
for line in lines:
print(line.strip()[0])
This however print output (2 times, which is another issue as I want it to print the output just once)
O
O
O
O
S
S
I want the output to be
O.U20
O.Z20
O.H21
O.M21
S3.U20
S3.Z20
in a dataframe, so that I can create further tables. I thought of using multiple columns however O
symbol has 4 characters and S
symbol has 5 characters.
Upvotes: 0
Views: 1096
Reputation: 129
first of all you need to convert the txt to csv, after this you can read it with pandas and turn them to the dataframe :
import glob
import pandas as pd
for each in glob.glob('*.txt'):
with open(each , 'r') as file:
content = file.readlines()
with open('{}.csv'.format(each[0:-4]) , 'w') as file:
file.writelines(content)
for each in glob.glob('*.csv'):
dataframe = pd.read_csv(each , skiprows=0 , header=None , index_col= 0)
then:
dataframe.reset_index(inplace=True)
output:
>>>print(dataframe[0])
0 O.U20
1 O.Z20
2 O.H21
3 O.M21
4 S3.U20
5 S3.Z20
Name: 0, dtype: object
Upvotes: 1