Extract a column from text file and store it in dataframe in Python

Question

I have multiple files (thousands of files) in a folder, I'm reading these files using some glob function. What I want to do is print the first column (text file doesn't have a header column) and store it in some dataframe as I need to make tables based on calculations across multiple files. Here is my data (Sample data of two files)

File1:

O.U20,99.73000,75538,99.73500,51794,57821,99.73167,1062,4,,,,99.73173
O.Z20,99.70000,58974,99.70500,6748,35815,99.70250,468,3,99.70500,1132,2,99.70048
O.H21,99.79500,4274,99.80000,47043,49961,,,,99.79750,3424,3,99.79236
O.M21,99.81000,48584,99.81500,7062,37456,99.81167,243,3,99.81500,234,2,99.80975
S3.U20,3.000,1132,3.500,69740,3831,,,,3.250,1380,2,3.125
S3.Z20,-9.500,58855,-9.000,27304,3295,-9.250,468,2,-9.000,3730,2,-9.188

File 2:

O.U20,99.73000,75711,99.73500,51794,57821,99.73167,1062,4,,,,99.73173
O.Z20,99.70000,59142,99.70500,6748,35815,99.70250,468,3,99.70500,1132,2,99.70048
O.H21,99.79500,4447,99.80000,47043,49961,,,,99.79750,3424,3,99.79236
O.M21,99.81000,48765,99.81500,7062,37456,99.81167,243,3,99.81500,234,2,99.80975
S3.U20,3.000,1132,3.500,69740,3831,,,,3.250,1380,2,3.125
S3.Z20,-9.500,58855,-9.000,27477,3295,-9.250,468,2,-9.000,3730,2,-9.188

This is my code I'm working on

import glob
for file in glob.glob("C:/Users/Data/*"):
    print(file)
    myfile = open(file,"r")
    lines = myfile.readlines()
    for line in lines:
         print(line.strip()[0])

This however print output (2 times, which is another issue as I want it to print the output just once)

I want the output to be

O.U20
O.Z20
O.H21
O.M21
S3.U20
S3.Z20

in a dataframe, so that I can create further tables. I thought of using multiple columns however O symbol has 4 characters and S symbol has 5 characters.

Bahram Jannesar · Accepted Answer

first of all you need to convert the txt to csv, after this you can read it with pandas and turn them to the dataframe :

import glob
import pandas as pd

for each in glob.glob('*.txt'):
    with open(each , 'r') as file:
        content = file.readlines()
        with open('{}.csv'.format(each[0:-4]) , 'w') as file:
            file.writelines(content)

for each in glob.glob('*.csv'):
    dataframe = pd.read_csv(each , skiprows=0 , header=None , index_col= 0)

then:

dataframe.reset_index(inplace=True)

output:

>>>print(dataframe[0])
0     O.U20
1     O.Z20
2     O.H21
3     O.M21
4    S3.U20
5    S3.Z20
Name: 0, dtype: object

Extract a column from text file and store it in dataframe in Python

Answers (1)

Related Questions