Reputation: 403
i'm trying to apply a transform to a groupby
object in pandas.
Here's the code:
df = pd.DataFrame({
'id':['012', '013', '014', '014', '015', '015', '016', '016', '017', '017'],
'date': pd.to_datetime(
['2008-11-05', 'NaT', 'NaT', '2008-11-05', 'NaT', '2008-11-05',
'NaT', '2008-11-05', 'NaT', '2008-11-05']),
'grade': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan],
'length': [1, 2, 3, 4, 5, 6, 7, 8, np.nan, 10]})
df['uuid'] = np.nan
df
Out[7]:
id date grade length uuid
0 012 2008-11-05 NaN 1.0 NaN
1 013 NaT NaN 2.0 NaN
2 014 NaT NaN 3.0 NaN
3 014 2008-11-05 NaN 4.0 NaN
4 015 NaT NaN 5.0 NaN
5 015 2008-11-05 NaN 6.0 NaN
6 016 NaT NaN 7.0 NaN
7 016 2008-11-05 NaN 8.0 NaN
8 017 NaT NaN NaN NaN
9 017 2008-11-05 NaN 10.0 NaN
In[8]:
df.groupby(['id', 'date']).uuid.transform(lambda g: uuid.uuid4())
Out[9]:
...
...
ValueError: Length mismatch: Expected axis has 5 elements, new values have 10 elements
Similar to this question, I assume the problem is with the NaT
in the date column, so I tried df.fillna('nan')
. Unfortunately, this threw the same error - is this because the date column recognises the string 'nan'
as a np.nan
?
I tried filling with a string, 'nullv'
, which got me 'ValueError: could not convert string to Timestamp'
.
So, my current solution looks like:
df['uuid'] = np.nan
df.date = df.date.astype('str')
df.uuid = df.groupby(['id', 'date']).uuid.transform(lambda g: uuid.uuid4())
df.date = pd.to_datetime(df.date)
df
Out[9]:
id date grade length uuid
0 012 2008-11-05 NaN 1.0 267b9c5f-41d9-4a8c-91af-aaa2dbddc911
1 013 NaT NaN 2.0 0e7ae8fa-cf64-4c3a-abd8-85d40b6253a4
2 014 NaT NaN 3.0 d1de91d8-099e-492c-8434-94ebd269280f
3 014 2008-11-05 NaN 4.0 91b42203-1a31-4dfe-8566-abba3686734f
4 015 NaT NaN 5.0 6a83b025-98c4-4196-8bfb-1ca88e426d8b
5 015 2008-11-05 NaN 6.0 d0ba9dfc-fa2b-4a1f-995b-66f798bd9259
6 016 NaT NaN 7.0 67a26331-03de-440e-8958-89a375007535
7 016 2008-11-05 NaN 8.0 ca94c6f2-1520-4162-94cf-cf4536fb8828
8 017 NaT NaN NaN 133da892-a0ef-4fa3-9557-14049e8f3b66
9 017 2008-11-05 NaN 10.0 4a19db2b-0166-45e0-aff0-54f83b479507
There's surely another way than converting to string and back again?
Upvotes: 4
Views: 4773