Python, Merging rows with the same id - date but different values in one column

Question

I have selected the rows that are and aren't mentioning 'Korona', and counted them by date. Some of the dates don't have Korona True. The dataframe looks like:

TABLE 1

	Published_date	Korona	Count
242	2020-06-01	False	13
243	2020-06-01	True	3
244	2020-06-02	False	7
245	2020-06-02	True	1
246	2020-06-03	False	11
247	2020-06-04	False	8
248	2020-06-04	True	1
249	2020-06-05	False	10
250	2020-06-06	False	5
251	2020-06-07	False	5
252	2020-06-08	False	14

What I'm trying to do is remove duplicate date rows, but transform the value to another column. So for example this:

	Published_date	Korona	Count
242	2020-06-01	False	13
243	2020-06-01	True	3

Looks like this (after some renaming of columns and adding an Count-All column):

TABLE 2

	Published_date	Count-NoKorona	Count-Korona	Count-All
152	2020-06-01	13	3	16

And I'm doing that with this code (found it on Python, Merging rows with same value in one column ) :

df = df.set_index(['Published_date', df.groupby('Published_date').cumcount()])['Count'].unstack().add_prefix('Count').reset_index()

Problem: For some reason after that line of code, my data gets mixed up.

Before that line everything was fine. I had 782 Korona True rows (For a test I only took 'True' rows from the table 1, and summed up the Count of it, and it was correct --> 782). So 782/3443 True.

After the code line I get a sum of 1011/3443.

I'm guessing it takes some wrong values of dates, or it gets mixed up, but I don't know how to fix it, and the data table is too big to found the mistakes manualy to try and understand the problem better.

I would be grateful for any help. (Also sorry if the question doesn't look okay, it's my first :D)

Jonathan Leon · Accepted Answer

Give this a try using pivot table

d = ''' Published_date  Korona  Count
242 2020-06-01  False   13
243 2020-06-01  True    3
244 2020-06-02  False   7
245 2020-06-02  True    1
246 2020-06-03  False   11
247 2020-06-04  False   8
248 2020-06-04  True    1
249 2020-06-05  False   10
250 2020-06-06  False   5
251 2020-06-07  False   5
252 2020-06-08  False   14'''

df = pd.read_csv(io.StringIO(d), sep='\s+', engine='python')

# pivot the data and reset the index
df1 = pd.pivot_table(df, values='Count', index=['Published_date'],
                    columns=['Korona'], aggfunc=np.sum, fill_value=0).reset_index()
# rename the columns to what you want
df1.columns = ['Published_date', 'Count-NoKorona', 'Count-Korona']
# sum the values into a new column
df1['Count-All'] = df1[['Count-NoKorona', 'Count-Korona']].sum(axis=1)

Output:

  Published_date  Count-NoKorona  Count-Korona  Count-All
0     2020-06-01              13             3         16
1     2020-06-02               7             1          8
2     2020-06-03              11             0         11
3     2020-06-04               8             1          9
4     2020-06-05              10             0         10
5     2020-06-06               5             0          5
6     2020-06-07               5             0          5
7     2020-06-08              14             0         14

Python, Merging rows with the same id - date but different values in one column

Answers (1)

Related Questions