Reputation: 25
I've an json array with data like
[{
'Id': 2837
'Date': datetime.date(2020, 01, 01),
'Caption': 'asdf'
}, {
'Id': 2838
'Date': datetime.date(2020, 01, 01),
'Caption': 'fasd'
},{
'Id': 2839
'Date': datetime.date(2020, 01, 04),
'Caption': 'sdfa'
}]
and a dataframe with a row for each date
DATE LEN
0 2020-01-01 NaN
1 2020-01-02 NaN
2 2020-01-03 NaN
3 2020-01-04 NaN
4 2020-01-05 NaN
5 2020-01-06 NaN
Now I would like to count the elements in the json array and write them into the dataframe. I tried
df['LEN'] = len([x for x in jsonArray if x['Date'] == df['DATE']])
but it's not working. I get an error
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What am I doing wrong?
Upvotes: 1
Views: 3218
Reputation: 2137
You can do it easily with join and groupby
Code
df_json = pd.DataFrame(jsonArray)
df_json['Date'] = df_json['Date'].astype(str)
df = df.set_index('Date').join(df_json.set_index('Date'), how='left', on='Date')
grp_df = df.reset_index().groupby('Date', as_index=False)['Id'].count()
grp_df.rename(columns={'Id': 'Len'}, inplace=True)
Output
Date Len
0 2020-01-01 2
1 2020-01-02 0
2 2020-01-03 0
3 2020-01-04 1
4 2020-01-05 0
5 2020-01-06 0
Upvotes: 0
Reputation: 5183
You can parse jsonArray
into a dataframe and use groupby
to count the rows for each date.
Sample data
import datetime
jsonArray = ([
{'Id': 2837, 'Date': datetime.date(2020, 1, 1), 'Caption': 'asdf'},
{'Id': 2838, 'Date': datetime.date(2020, 1, 1), 'Caption': 'fasd'},
{'Id': 2839, 'Date': datetime.date(2020, 1, 4), 'Caption': 'sdfa'},
])
df_dates = pd.DataFrame({
'DATE': pd.date_range('2020-01-01', '2020-01-06'),
'LEN': float('nan')
})
The code
# parse jsonArray to dataframe
df = pd.DataFrame(jsonArray)
# count rows for each Date
df = df.groupby('Date').size().rename('LEN')
# set DATE to index so `df.update` can align
df_dates.set_index('DATE', inplace=True)
df_dates.update(df)
Output
LEN
DATE
2020-01-01 2.0
2020-01-02 NaN
2020-01-03 NaN
2020-01-04 1.0
2020-01-05 NaN
2020-01-06 NaN
Upvotes: 1
Reputation: 7852
# necessary imports
import datetime
import pandas as pd
Lets say you have this list:
ourlist = \
[{
'Id': 2837,
'Date': datetime.date(2020, 1, 1),
'Caption': 'asdf'
},{
'Id': 2838,
'Date': datetime.date(2020, 1, 1),
'Caption': 'fasd'
},{
'Id': 2839,
'Date': datetime.date(2020, 1, 4),
'Caption': 'sdfa'
}]
and a dataframe, for example:
df = pd.DataFrame([datetime.datetime(2020,1,1),
datetime.datetime(2020,1,2),
datetime.datetime(2020,1,3),
datetime.datetime(2020,1,4)],
columns=['DATE'])
Then you can do:
df['LEN'] = df['DATE'].apply(lambda datetimeobject: sum(datetimeobject == dates for dates in [eachdict['Date'] for eachdict in ourlist]))
or, more concisely, using How can I count the occurrences of a list item?:
df['LEN'] = df['DATE'].apply(lambda datetimeobject: [eachdict['Date'] for eachdict in ourlist].count(datetimeobject))
(In both cases, I am using a list comprehension to get the Date
entries from ourlist
, a list of dictionaries.)
Both of them will result in a df
:
DATE LEN
0 2020-01-01 2
1 2020-01-02 0
2 2020-01-03 0
3 2020-01-04 1
Upvotes: 0