Mike Ruigrok
Mike Ruigrok

Reputation: 13

Converting an entire dataframe's values to unique integers for doing a fisher's test

I would like to change my string values to unique integer IDS for an entire dataframe, this is a simplified version of what I want to do. The real one has 20+ columns and 100,000 + rows. I need to convert this to do a fisher test per row which needs to differentiate between unique integers to see a difference between column groups.

X col1 col2 col3

1 0/0 1/1 0/0

2 0/2 0/0 1/1

3 1/2 0/2 1/1

4 0/0 0/0 0/0

to

X col1 col2 col3

1 1 2 1

2 3 1 2

3 4 3 2

4 1 1 1

Tried to factorize, but couldn't figure out how to do this for an entire dataframe like this, could only do this for a columns with the following code: df = df.apply(lambda x: pd.factorize(x)[0]).

What work too is to just do it per row as its parsed per row.

Upvotes: 1

Views: 71

Answers (3)

Dev Khadka
Dev Khadka

Reputation: 5451

you can do it like this using apply function

df = pd.DataFrame([['0/0', '1/1', '0/0'], ['0/2', '0/0', '1/1'], ['1/2', '0/2', '1/1'], ['0/0', '0/0', '0/0']], columns=('col1', 'col2', 'col3'))

df2 = df.apply(lambda s: [sum(map(int,x.split("/"))) for x in s])
df2[df2==0] = 1
df2

Result

 col1  col2  col3
0     1     2     1
1     2     1     2
2     3     2     2
3     1     1     1

Upvotes: 0

Chan
Chan

Reputation: 4301

Try this:

df = pd.DataFrame([['0/0', '1/1', '0/0'], ['0/2', '0/1', '1/1'], ['1/2', '0/2', '1/1'], ['0/0', '0/0', '0/0']])

d = {n:m for m, n in enumerate(list(set([j for i in df.values.tolist() for j in i])))}

df_new = df.replace(d)

Input:

     0    1    2
0  0/0  1/1  0/0
1  0/2  0/1  1/1
2  1/2  0/2  1/1
3  0/0  0/0  0/0

Output:

   0  1  2
0  2  4  2
1  1  3  4
2  0  1  4
3  2  2  2

Upvotes: 0

Andy L.
Andy L.

Reputation: 25239

Use df.rank with method='dense'. Each unique string will be assigned an unique number/rank

df_final = df.set_index('X').rank(method='dense').astype(int)

Out[244]:
   col1  col2  col3
X
1     1     3     1
2     2     1     2
3     3     2     2
4     1     1     1

Upvotes: 1

Related Questions