Mustafa Yousuf
Mustafa Yousuf

Reputation: 23

Combine row data from multiple txt files to a single data frame in column format

I have row data in each text file in the following format

File 1

   Sample 1, 24/07/2017 13:26:08
0  Peak at 1219 , 1.864
1  Peak at 1092 , 0.412
2  Peak at 1358 , 1.661

File 2

   Sample 2, 24/07/2017 14:28:15
0  Peak at 1219 , 1.544
1  Peak at 1092 , 0.315
2  Peak at 1358 , 1.564

File 3

   Sample 3, 24/07/2017 15:31:05
0  Peak at 1219 , 1.954
1  Peak at 1092 , 0.524
2  Peak at 1358 , 1.423

I want to combine the data from all the files and create a single data frame in a column format like this.

  Sample No  Date        Time      Peak at 1219  Peak at 1092  Peak at 1358 
0     1      24/07/2017  13:26:08    1.864        0.412          1.661
1     2      24/07/2017  13:28:15    1.544        0.315          1.564
2     3      24/07/2017  13:31:05    1.954        0.524          1.423

Can anyone please help with a code. Thanks alot

Upvotes: 0

Views: 68

Answers (1)

jezrael
jezrael

Reputation: 862441

There is main function concat, which create big df. But need set_index for align data.

Then transpose df by T and convert second level selected by get_level_values to_datetime.

For first and second column use insert with DatetimeIndex.date and DatetimeIndex.time.

Last remove second level by reset_index with drop=True, rename_axis for column name and last reset_index again:

dfs = [df1,df2,df3]
#set first column to index
dfs = [x.set_index(x.columns[0]) for x in dfs]

df = pd.concat(dfs, 1, keys = range(1, len(dfs) + 1)).T
print (df)
                       Peak at 1219  Peak at 1092  Peak at 1358
1 24/07/2017 13:26:08         1.864         0.412         1.661
2 24/07/2017 14:28:15         1.544         0.315         1.564
3 24/07/2017 15:31:05         1.954         0.524         1.423

print (df.index.labels[0])
FrozenNDArray([0, 1, 2], dtype='int8')

dates = pd.to_datetime(df.index.get_level_values(1))
df.insert(0, 'Date', dates.date)
df.insert(1, 'Time', dates.time)
df = df.reset_index(level=1, drop=True).rename_axis('Sample No').reset_index()
print (df)
   Sample No        Date      Time  Peak at 1219  Peak at 1092  Peak at 1358
0          1  2017-07-24  13:26:08         1.864         0.412         1.661
1          2  2017-07-24  14:28:15         1.544         0.315         1.564
2          3  2017-07-24  15:31:05         1.954         0.524         1.423

Upvotes: 1

Related Questions