Nelson Chung
Nelson Chung

Reputation: 87

How do I subset a dataframe by criteria when some of the column are strings, and some in the same column integers?

I have a data frame called dc, with the column 'SEX' that is 92201 rows.

When I try to subset all the 1s, dc1num=dc[dc['SEX']==1] the new dataframe produces len(dc1num)= 47614 rows.

When I try to subset all the 0s, dc0num=dc[dc['SEX']==0], the new dataframe produces len(dc0num)= 40492 rows.

When I try to subset as a string all the 1s, dc1str=dc[dc['SEX']=='1'], the new dataframe produces len(dc1str)= 2130 rows.

When I try to subset as a string all the 0s, dc0str=dc[dc['SEX']=='0'], the new dataframe produces len(dc0str)= 1965 rows.

They all add up to 47614+40492+2130+1965 = 92201 rows exactly, the same number in the original dataset. So obviously some of the ones are coded 1, some '1'; some of the zeroes are coded 0, some '0.'

I gather from this information that some rows in this dataframe column are coded as integers, and some as strings.

I want to subset all 1s and 0s, so that

len(dc1)= 49,744

and

len(dc0)= 47,614

I tried to make them all strings by dc.SEX.apply(str0) and then trying dc1=dc[dc['SEX']=='1'] and dc0=dc[dc['SEX']=='0'], but this yielded the same result as before. Didn't do anything. How should I go about resolving this issue?

Upvotes: 2

Views: 588

Answers (5)

user2844967
user2844967

Reputation:

Solution 1: Convert all the values in the column to integer

df['col1']=df['col1'].astype(int)

        (OR)

import pandas as pd df['col1']=pd.to_numeric(df['co11'])

Solution 2: Convert all the values in the column to string Example : df['col1']=df.col1.apply(str)

One of the above solutions should work

Upvotes: 0

U13-Forward
U13-Forward

Reputation: 71570

A way that will work with non-integer like numbers:

df['SEX'] = pd.to_numeric(df['SEX'], errors='coerce')

Upvotes: 0

Kris
Kris

Reputation: 420

To coerce the data to string format, try using the function below; the way you're calling apply doesn't work.

df['A'] = df['A'].astype(str)

Upvotes: 0

ansev
ansev

Reputation: 30920

Use:

dc['SEX']=dc['SEX'].astype(int)
# or dc['SEX']=dc['SEX'].astype(float)

and then:

dc1num=dc[dc['SEX']==1]
dc0num=dc[dc['SEX']==0]

You can also do:

for i,group in df.groupby('SEX'):
    print(group)

Upvotes: 2

BENY
BENY

Reputation: 323226

Usually we can do one time conversion

df.SEX=pd.to_numeric(df.SEX)

Then we can split the df

df1=df.query('SEX==1')
df2=df.query('SEX==0')

Upvotes: 0

Related Questions