Reputation: 477
I have this code which manipulates a data set to create a new column by pulling info from an existing column. In order to match the data properly using a pd.merge function with another data set, I would like to convert the 'Channel ID' column to integers. Despite the current use of .astype(int) the results data type shows up as float64 looking at frame with .info()
def cost(received_frame):
received_frame.columns = ['Campaign', 'Ad Spend']
campaigns = received_frame['Campaign']
ID = []
for c in campaigns:
blocks = re.split('_', c)
for block in blocks[1:]:
if len(block) == 6 and block.isdigit():
ID.append(block)
ID = pd.Series(ID).str.replace("'","")
ID = pd.DataFrame(ID)
both = [ID,received_frame]
frame = pd.concat(both,axis=1)
frame.columns = ['Channel ID', 'Campaign', 'Ad Spend']
frame['Channel ID'] = frame['Channel ID'].dropna().astype(int)
return frame
Upvotes: 3
Views: 5431
Reputation: 879501
Suppose frame
looks like this:
import numpy as np
import pandas as pd
frame = pd.DataFrame({'Channel ID':['1',np.nan,'2'], 'foo':['bar','baz',np.nan]})
Channel ID foo
0 1 bar
1 NaN baz
2 2 NaN
You could drop rows from frame
where Channel ID
is NaN:
mask = pd.notnull(frame['Channel ID'])
frame = frame.loc[mask]
and then astype(int)
will successful convert the column to dtype int
:
frame['Channel ID'] = frame['Channel ID'].astype(int)
yields
Channel ID foo
0 1 bar
2 2 NaN
As Ami Tavory explained, you can't drop the NaNs solely from frame['Channel ID']
with
frame['Channel ID'] = frame['Channel ID'].dropna()
because upon assignment aligns the index on the right-hand side with the
relevant rows on the left-hand side. It has no effect on the rows on the left whose index is not mentioned on the right-hand side. So the NaNs remain in the bigger DataFrame,
frame
.
Since NaN is a float value, the dtype must remain a float dtype as long as the column contains NaNs.
Upvotes: 3
Reputation: 76297
When you write
frame['Channel ID'].dropna().astype(int)
You're returning a series with possibly fewer indices, as you're dropping NAs.
Then, when you assign it as
frame['Channel ID'] = frame['Channel ID'].dropna().astype(int)
It performs a sort of merge with the existing values (according to the indices), and those are floats, so it must convert these too.
You should replace it with something else, depending on your problem (fillna
?).
Upvotes: 5