inobrevi
inobrevi

Reputation: 51

Grouping data series by timestamp in Pandas Dataframe (python)

I have Pandas Dataframe like below:

                                    redacted_name_1  \
0  [1628377576.0, 1628377939.98, 1628377942.04, 1...   
1  [295.257080078125, 295.1255187988281, 295.2570...   

                                   redacted_name_2   \
0  [1628377494.927, 1628377855.377, 1628377957.39...   
1  [9.3e-09, 9.3e-09, 9.2e-09, 9.3e-09, 9.2e-09, ...   

                                   redacted_name_3   \
0  [1628377543.443, 1628377903.8830001, 162837826...   
1  [1.7e-08, 1.7e-08, 1.7e-08, 1.7e-08, 1.7e-08, ...   

                                   redacted_name_4   \
0  [1628377235.24, 1628377840.33, 1628378440.54, ...   
1  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...   

Series 0 for every frame is a timestamp. Series 1 for every frame is a value.

I would like to merge them for easier handling (for me) by timestamp. To have something like this:

Timestamp, redacted_name_1 , redacted_name_2, redacted_name_3, redacted_name_4,
1628377576, 295.257080078125, NaN, NaN, NaN, 
1628377494, NaN, 9.3e-09, NaN, NaN,

Timestamp should be rounded to a second.

If reading happens to be on the same time, it should be put in the same row as other one.

No timestamp duplication.

Upvotes: 1

Views: 66

Answers (1)

jezrael
jezrael

Reputation: 862511

Use list comprehension for create Series, join together by concat and last rounf with convert to datetimes if necessary:

print (df)
                                 redacted_name_1  \
0     [1628377576.0,1628377939.98,1628377942.04]   
1  [295.257080078125,295.1255187988281,295.2570]   

                                 redacted_name_2  
0  [1628377494.927,1628377855.377,1628377957.39]  
1                      [9.3e-09,9.3e-09,9.2e-09] 

#if necessary convert strings to lists
#import ast
#df = df.applymap(ast.literal_eval)

L = [pd.Series(df.loc[1, x], index=df.loc[0, x], name=x) for x in df.columns]
df = pd.concat(L,axis=1)

df.index = np.round(df.index).astype(np.int64)
print (df)
            redacted_name_1  redacted_name_2
1628377495              NaN     9.300000e-09
1628377576       295.257080              NaN
1628377855              NaN     9.300000e-09
1628377940       295.125519              NaN
1628377942       295.257000              NaN
1628377957              NaN     9.200000e-09

Or:

df.index = pd.to_datetime(np.round(df.index).astype(np.int64), unit='s')
print (df)
                     redacted_name_1  redacted_name_2
2021-08-07 23:04:55              NaN     9.300000e-09
2021-08-07 23:06:16       295.257080              NaN
2021-08-07 23:10:55              NaN     9.300000e-09
2021-08-07 23:12:20       295.125519              NaN
2021-08-07 23:12:22       295.257000              NaN
2021-08-07 23:12:37              NaN     9.200000e-09

Upvotes: 1

Related Questions