Resample a dataframe with a column of lists

Question

Trying to resample a dataframe in pandas. I receive in input a .csv like this (the list in Data column are in form of strings): `

Name,Timestamp,Data
A1,5.26,"[1.0,1.2,1.9]"
A1,5.28,"[1.8,2.1,3.9]"
A1,5.30,"[1.2,1.4,0.9]"
A1,5.32,"[...]"
...
A2,5.26,"[...]"
A2,5.28,"[...]"
A2,5.30,"[...]"
A2,5.32,"[...]"
...
A3,5.26,"[...]"
A3,5.28,"[...]"
A3,5.30,"[...]"
A3,5.32,"[...]"`

Datas are recorded at 50hz (so every 20ms). I want to resample 25hz (so every 40ms).

I converted the Data column from string to an actual list with

df['Data'] = df['Data'].apply(ast.literal_eval)

and the Timestamp into seconds with:

df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='s')

I know that I've to use the .resample() function so I tried with

df.groupby('Name').resample("40L", on='Timestamp')

and it doesn't give me errors but it seems it doesn't resample at all in fact I've the same number of rows with same datas and just the Timestamp column converted into Datetime (and if I add a .mean() after the end of resample function it gives me the error No numeric types to aggregate).

I want the my table after the resample looks like:

Name Timestamp  Data
A1    5.26     [...]
A1    5.30     [...]
...
A2    5.26     [...]
A2    5.30     [...]
...
A3    5.26     [...]
A3    5.30     [...]

What should I do?

Quang Hoang · Accepted Answer

Your problem is to convert the data part into actual numeric data. ast.literal_eval doesn't cut it because you cannot perform arithmetic operations on list. Here's what I would do:

df = pd.read_csv('your.csv')
df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='s')

df = df.join(df['Data'].str[1:-1]
                       .str.split(',', expand=True)
                       .astype(float)
            )

# resample
df.groupby('Name').resample('40L', on='Timestamp').mean()

After that, your df would be something like:

                                0     1    2
Name Timestamp                              
A1   1970-01-01 00:00:05.240  1.0  1.20  1.9
     1970-01-01 00:00:05.280  1.5  1.75  2.4
     1970-01-01 00:00:05.320  1.4  1.65  2.9
     1970-01-01 00:00:05.360  1.5  1.75  2.4
     1970-01-01 00:00:05.400  1.2  1.40  0.9
A2   1970-01-01 00:00:05.240  1.0  1.20  1.9
     1970-01-01 00:00:05.280  1.5  1.75  2.4
     1970-01-01 00:00:05.320  1.4  1.65  2.9
     1970-01-01 00:00:05.360  1.5  1.75  2.4
     1970-01-01 00:00:05.400  1.2  1.40  0.9

Resample a dataframe with a column of lists

Answers (2)

Resampling approach

Selecting index sampling

Related Questions