Reputation:
So, I have a pandas dataframe with large no. of rows. whose one row might look like this and more.
data_store.iloc[0]
Out[5]:
mac_address 00:03:7f:05:c0:06
Visit 1/4/2016
storeid Ritika - Bhubaneswar
Mall or not High Street
Address 794, Sahid Nagar, Janpath, Bhubaneswar-751007
Unnamed: 4 OR
Locality JanPath
Affluence Index Locality 4
Lifestyle Index 5
Tourist Attraction In the locality? 0
City Bhubaneswar
Pop Density City 2131
Population Density of City NaN
City Affluence Index Medium
Mall / Shopping Complex High Street
Mall Premiumness Index NaN
Multiplex NaN
Offices Nearby NaN
Food Court NaN
Average Footfall NaN
Average Rental NaN
Size of the mall NaN
Area NaN
Upscale Street NaN
Place of Worship in vicinity NaN
High Street / Mall High Street
Brand Premiumness Index 4
Restaurant Nearby? 0
Store Size Large
Area.1 2600
There may be some more value in place of Nan just take it as a example.Now the unique key here is mac_address so I want to start with a empty JSON document. now for each row of data i will update the JSON file. like
{
mac_address: "00:03:7f:05:c0:06"
{
"cities" : [
{
"City Name1" : "Wittenbergplatz",
"City count" : "12"
},
{
"City Name2" : "Spichernstrasse",
"City Count" : "19"
},
{
"City Name3" : "Weberwiese",
"City count" : "30"
}
]
}
}
city count is no. of times a mac_address visited to a city. By reading this particular row I would like to update a city named Bhubneswar and Count 1. Now for each new row i would like to check if it is already there in JSON for that probably i would have to import the JSON in python in dictionary or something(suggest).So, if a mac_address is already there i would like to update the info of that row in existing JSON across that mac_address and if it is not there i would like to add that mac_address as new field and update the info of that row across that mac_address. I have to do it in python and pandas dataframe as i have a bit idea about pandas dataframe. Any help on this?
Upvotes: 0
Views: 584
Reputation: 210832
your desired JSON file is not a valid JSON file, here is the error message thrown by the online JSON validator:
Input:
{
"mac_address": "00:03:7f:05:c0:06" {
"cities": [{
"City Name1": "Wittenbergplatz",
"City count": "12"
}, {
"City Name2": "Spichernstrasse",
"City Count": "19"
}, {
"City Name3": "Weberwiese",
"City count": "30"
}]
}
}
Error
Error: Parse error on line 2:
..."00:03:7f:05:c0:06" { "cities": [{
-----------------------^
Expecting 'EOF', '}', ':', ',', ']', got '{'
this solution might help you to start:
In [440]: (df.groupby(['mac_address','City'])
.....: .size()
.....: .reset_index()
.....: .rename(columns={0:'count'})
.....: .groupby('mac_address')
.....: .apply(lambda x: x[['City','count']].to_dict('r'))
.....: .to_dict()
.....: )
Out[440]:
{'00:03:7f:05:c0:01': [{'City': 'aaa', 'count': 1}],
'00:03:7f:05:c0:02': [{'City': 'bbb', 'count': 1}],
'00:03:7f:05:c0:03': [{'City': 'ccc', 'count': 2}],
'00:03:7f:05:c0:05': [{'City': 'xxx', 'count': 1},
{'City': 'zzz', 'count': 1}],
'00:03:7f:05:c0:06': [{'City': 'aaa', 'count': 1},
{'City': 'bbb', 'count': 1}],
'00:03:7f:05:c0:07': [{'City': 'aaa', 'count': 3},
{'City': 'bbb', 'count': 1}]}
data:
In [441]: df
Out[441]:
mac_address City
0 00:03:7f:05:c0:06 aaa
1 00:03:7f:05:c0:06 bbb
2 00:03:7f:05:c0:07 aaa
3 00:03:7f:05:c0:07 bbb
4 00:03:7f:05:c0:07 aaa
5 00:03:7f:05:c0:01 aaa
6 00:03:7f:05:c0:02 bbb
7 00:03:7f:05:c0:03 ccc
8 00:03:7f:05:c0:03 ccc
9 00:03:7f:05:c0:07 aaa
10 00:03:7f:05:c0:05 xxx
11 00:03:7f:05:c0:05 zzz
Upvotes: 1