Reputation: 350
I have the following dataframe:
id variable quantitative_v cualitative_v
a gdp 1 Nan
a exports 12 Nan
a imports 43 Nan
a category Nan A
a developed Nan Yes
b gdp 1 Nan
b exports 12 Nan
b imports 43 Nan
b category Nan A
b developed Nan Yes
The df I expect:
id variable value
a gdp 1
a exports 12
a imports 43
a category A
a developed Yes
b gdp 1
b exports 12
b imports 43
b category A
b developed Yes
How could I do that? I tried to sum, but that didn't work because they're different type of data
Upvotes: 0
Views: 1370
Reputation: 31146
Your data is structured such that NaN is showing in one column or the other. Better to update one column from the other.
df = pd.DataFrame({"id":["a","a","a","a","a","b","b","b","b","b"],"variable":["gdp","exports","imports","category","developed","gdp","exports","imports","category","developed"],"quantitative_v":[1,12,43,"Nan"," Nan",1,12,43,"Nan"," Nan"],"cualitative_v":["Nan"," Nan"," Nan","A","Yes","Nan"," Nan"," Nan","A","Yes"]})
df = df.replace({"Nan":np.nan, " Nan":np.nan}).reset_index()
mask = df["quantitative_v"].isna()
df.loc[mask,("quantitative_v")] = df[mask]["cualitative_v"]
df = df.drop("cualitative_v",1).rename({"quantitative_v":"value"},axis=1)
print(df.to_string(index=False))
output
index id variable value
0 a gdp 1
1 a exports 12
2 a imports 43
3 a category A
4 a developed Yes
5 b gdp 1
6 b exports 12
7 b imports 43
8 b category A
9 b developed Yes
Upvotes: 0
Reputation: 13255
Use fillna
or combine_first
:
df['quantitative_v'] = df['quantitative_v'].astype(float)
df['quantitative_v'].fillna(df['cualitative_v'])
df['quantitative_v'].combine_first(df['cualitative_v'])
Upvotes: 1