Abhis
Abhis

Reputation: 605

T-test on each row in pandas

I have a dataframe in which I am trying to apply T-test on each row but its giving me nan.

Code:

from scipy.stats import ttest_ind, ttest_rel
import pandas as pd

df_stat = df_stat[['day', 'hour', 'CallerObjectId', 'signals_normalized', 'presence_normalized']]

def ttest(a, b):
    t = ttest_ind(a, b)
    return t

df_stat['ttest'] = df_stat.apply(lambda row: ttest(row['presence_normalized'], row['signals_normalized']), axis=1)
print(df_stat)

Output:

            day  hour                        CallerObjectId  signals_normalized  presence_normalized       ttest
0    2021-04-04     9  287b19b7-32ce-4617-94b1-57a632f6f147            0.062500             0.514461  (nan, nan)
1    2021-04-04    16  287b19b7-32ce-4617-94b1-57a632f6f147            0.187500             1.000000  (nan, nan)
2    2021-04-04    17  287b19b7-32ce-4617-94b1-57a632f6f147            0.187500             0.895121  (nan, nan)
3    2021-04-04    18  287b19b7-32ce-4617-94b1-57a632f6f147            0.062500             0.608823  (nan, nan)
4    2021-04-04    19  287b19b7-32ce-4617-94b1-57a632f6f147            1.000000             0.716623  (nan, nan)
5    2021-04-04    20  287b19b7-32ce-4617-94b1-57a632f6f147            0.062500             0.314928  (nan, nan)

Upvotes: 2

Views: 618

Answers (1)

Swier
Swier

Reputation: 4186

A T-test is done to compare two distributions, you're using it to compare two single values.

Internally, a T-test divides by the variances of the distributions, the variance of a single sample is 0. So by doing a T-test on two individual values, you're dividing by zero. (See wikipedia)

The values you're doing a T-test on appear to be aggregated per hour, you should probably do a T-test on the values without aggregating them per hour. Or you could do a T-test on each day for of your current values.

Upvotes: 1

Related Questions