Reputation: 87
I have a data frame called dc, with the column 'SEX' that is 92201 rows.
When I try to subset all the 1s, dc1num=dc[dc['SEX']==1]
the new dataframe produces len(dc1num)= 47614
rows.
When I try to subset all the 0s, dc0num=dc[dc['SEX']==0]
, the new dataframe produces len(dc0num)= 40492
rows.
When I try to subset as a string all the 1s, dc1str=dc[dc['SEX']=='1'],
the new dataframe produces len(dc1str)= 2130
rows.
When I try to subset as a string all the 0s, dc0str=dc[dc['SEX']=='0'],
the new dataframe produces len(dc0str)= 1965
rows.
They all add up to 47614+40492+2130+1965 = 92201
rows exactly, the same number in the original dataset. So obviously some of the ones are coded 1, some '1'; some of the zeroes are coded 0, some '0.'
I gather from this information that some rows in this dataframe column are coded as integers, and some as strings.
I want to subset all 1s and 0s, so that
len(dc1)= 49,744
and
len(dc0)= 47,614
I tried to make them all strings by dc.SEX.apply(str0)
and then trying dc1=dc[dc['SEX']=='1']
and dc0=dc[dc['SEX']=='0']
, but this yielded the same result as before. Didn't do anything. How should I go about resolving this issue?
Upvotes: 2
Views: 588
Reputation:
Solution 1: Convert all the values in the column to integer
df['col1']=df['col1'].astype(int)
(OR)
import pandas as pd df['col1']=pd.to_numeric(df['co11'])
Solution 2: Convert all the values in the column to string Example : df['col1']=df.col1.apply(str)
One of the above solutions should work
Upvotes: 0
Reputation: 71570
A way that will work with non-integer like numbers:
df['SEX'] = pd.to_numeric(df['SEX'], errors='coerce')
Upvotes: 0
Reputation: 420
To coerce the data to string format, try using the function below; the way you're calling apply doesn't work.
df['A'] = df['A'].astype(str)
Upvotes: 0
Reputation: 30920
Use:
dc['SEX']=dc['SEX'].astype(int)
# or dc['SEX']=dc['SEX'].astype(float)
and then:
dc1num=dc[dc['SEX']==1]
dc0num=dc[dc['SEX']==0]
You can also do:
for i,group in df.groupby('SEX'):
print(group)
Upvotes: 2
Reputation: 323226
Usually we can do one time conversion
df.SEX=pd.to_numeric(df.SEX)
Then we can split the df
df1=df.query('SEX==1')
df2=df.query('SEX==0')
Upvotes: 0