Reputation: 233
I have a Dataframe df like this:
A B C D
2 1 O s h
4 2 P
7 3 Q
9 4 R h m
I have a function f to calculate C and D based on B for a row:
def f(p): #p is the value of column B for a row.
return p+'k', p+'n'
How can I populate the missing values for row 4&7 by applying the function f to the Dataframe?
The expected outcome is like below:
A B C D
2 1 O s h
4 2 P Pk Pn
7 3 Q Qk Qn
9 4 R h m
The function f has to be used as the real function is very complicated. Also, the function only needs to be applied to the rows missing C and D
Upvotes: 23
Views: 32816
Reputation: 91
I have a more easy way to do it, if the table is not so big.
def f(row): #row is the value of row.
if row['C']=='':
row['C']=row['B']+'k'
if row['D']=='':
row['D']=row['B']+'n'
return row
df=df.apply(f,axis=1)
Upvotes: 9
Reputation: 4540
I found this super confusing but eventually figured out a way of achieving this that didn't hurt my brain. Here it is, sorry if it doesn't match the example well...
dataframe with no index
# function to do the calcs
def f(row):
my_a = row['a'] # row is a Series, my_a is a scalar string
if my_a == 'a': # dummy logic to calc new values based on the row values
return [1, 2] # return 2 values to update 2 columns
else:
return [4, 5]
# simple test frame
input = pd.DataFrame.from_dict({
'a': ['a', 'd'],
'b': ['b', 'e'],
'c': ['c', 'f'],
'x': [0, 0],
'y': [0, 0]
})
# apply the function to update the x and y columns with the returned values
input[['x','y']] = input.apply(f, axis=1)
dataframe with an index
if your dataframe has an index.. you need to be a bit more explicit when you are doing the apply to ensure that "list-like results will be turned into columns"...
def f(row): # function to do the calcs
my_a = row['a'] # row is a Series, my_a is a scalar string
my_index = row.name # you might also want to use the index value in the calcs
if my_a == 'a': # dummy logic to calc new values based on the row values
return [1, 2] # return 2 values to update 2 columns
else:
return [4, 5]
input = pd.DataFrame.from_dict({
'an_index': ['indx1', 'indx2'],
'a': ['a', 'd'],
'b': ['b', 'e'],
'c': ['c', 'f'],
'x': [0, 0],
'y': [0, 0]
}).set_index(['an_index'])
# apply the function to update the x and y columns with the returned values
input[['x','y']] = input.apply(f, axis=1, result_type='expand')
Upvotes: 1
Reputation: 31171
If you want to use your function as such, here is a one liner:
df.update(df.B.apply(lambda x: pd.Series(dict(zip(['C','D'],f(x))))), overwrite=False)
In [350]: df
Out[350]:
A B C D
2 1 O s h
4 2 P Pk Pn
7 3 Q Qk Qn
9 4 R h m
You can also do:
df1 = df.copy()
df[['C','D']] = df.apply(lambda x: pd.Series([x['B'] + 'k', x['B'] + 'n']), axis=1)
df1.update(df, overwrite=False)
Upvotes: 11
Reputation: 5414
simply by doing the following
df.C.loc[df.C.isnull()] = df.B.loc[df.C.isnull()] + 'k'
df.D.loc[df.D.isnull()] = df.B.loc[df.D.isnull()] + 'n'
check this link indexing-view-versus-copy if you want to know why I've use loc
Upvotes: 0
Reputation: 21552
Maybe there is a more elegant way, but I would do in this way:
df['C'] = df['B'].apply(lambda x: f(x)[0])
df['D'] = df['B'].apply(lambda x: f(x)[1])
Applying the function to the columns and get the first and the second value of the outputs. It returns:
A B C D
0 1 O Ok On
1 2 P Pk Pn
2 3 Q Qk Qn
3 4 R Rk Rn
EDIT:
In a more concise way, thanks to this answer:
df[['C','D']] = df['B'].apply(lambda x: pd.Series([f(x)[0],f(x)[1]]))
Upvotes: 20