MysterioProgrammer91
MysterioProgrammer91

Reputation: 569

Read dataframe and when there is a zero find same column name and row name in another dataframe and populate

Hi there I have 2 large pandas dataframes, df and df1.

df is something like this which has values or 0.

df:
            Amazon  Apple   Blackberry  Yahoo   Google
    1/1/2000    0   13         0          42    0
    1/1/2001    0   41         0          53    0
    1/1/2002    34  42         0          64    0
    1/1/2003    45  63         0          74    0
    1/1/2004    43  74        24          75    0
    1/1/2005    0   89        25          86    25

now I have df1 which also has some mutual column names and row index. eg:

df1:
              Amazon    Apple   Blackberry  Yahoo   Google
    1/1/2000    0          0       0          53       53
    1/1/2001    24         0       53         53       42
    1/1/2002    42        31       53         53       74
    1/1/2003    52        43      53           0        89
    1/1/2004    52        53       0           0        99
    1/1/2005    24        53       0           0       100

Now I want to use df, and maintain all the values in df. However when there is a 0 in df, I want to look up df1 and populate the values of df1 into df given the column name and index date are the same.

Eg output would be something like this:

dfoutput
              Amazon    Apple   Blackberry  Yahoo   Google
    1/1/2000    0         13    0             42    53
    1/1/2001    24        41    53            53    42
    1/1/2002    34        42    53            64    74
    1/1/2003    45        63    53            74    89
    1/1/2004    43        74    24            75    99
    1/1/2005    24        89    25            86    25

If there is no match for df in df1 then the value remains 0 in df. In reality df and df1 do differ slightly in the larger dataset.

Thanks.

Upvotes: 1

Views: 39

Answers (1)

jezrael
jezrael

Reputation: 862921

You can use combine_first or (fillna) with replace 0 to NaN:

df2 = df.replace(0,np.nan).combine_first(df1.replace(0,np.nan))
#alternatively
#df2 = df.replace(0,np.nan).fillna(df1.replace(0,np.nan))
print (df2)
          Amazon  Apple  Blackberry  Yahoo  Google
1/1/2000     NaN     13         NaN     42    53.0
1/1/2001    24.0     41        53.0     53    42.0
1/1/2002    34.0     42        53.0     64    74.0
1/1/2003    45.0     63        53.0     74    89.0
1/1/2004    43.0     74        24.0     75    99.0
1/1/2005    24.0     89        25.0     86    25.0

df2 = df.mask(df==0).combine_first(df1.mask(df1==0))
#alternatively
#df2 = df.mask(df==0).fillna(df1.mask(df1==0))
print (df2)
          Amazon  Apple  Blackberry  Yahoo  Google
1/1/2000     NaN     13         NaN     42    53.0
1/1/2001    24.0     41        53.0     53    42.0
1/1/2002    34.0     42        53.0     64    74.0
1/1/2003    45.0     63        53.0     74    89.0
1/1/2004    43.0     74        24.0     75    99.0
1/1/2005    24.0     89        25.0     86    25.0

Last replace NaN to 0 and cast to int:

df2 = df2.fillna(0).astype(int)
print (df2)
          Amazon  Apple  Blackberry  Yahoo  Google
1/1/2000       0     13           0     42      53
1/1/2001      24     41          53     53      42
1/1/2002      34     42          53     64      74
1/1/2003      45     63          53     74      89
1/1/2004      43     74          24     75      99
1/1/2005      24     89          25     86      25

Upvotes: 2

Related Questions