Reputation: 2587
I am currently creating my data frame with the below commands
getting data from url
...
devices = get_device_data.json()
device_data = devices["data"]
p_dev = pd.DataFrame(device_data)
However the json dictionary has 60 columns in, of which I only want two of them, is there a way to specify which columns to include when creating a data frame, or any way I can achieve the desired result?
Thanks
EDIT: some sample data with some columns deleted. I actually only want the id and hostname columns
[
{
"id": 474378238
"account": "https: //www.****.com/api/v2/accounts/38021/",
"bsid": None,
"carrier": "BOB",
"carrier_id": "BOB BOB",
"channel": None,
"connection_state": "connected",
"gsn": "356853050758871",
"homecarrid": "BOB",
"hostname": "BOB1345345",
"is_asset": True,
"is_gps_supported": True,
"is_upgrade_available": False,
"is_upgrade_supported": True,
"ltebandwidth": "20 MHz",
"mac": None,
"serial": "356853050758871",
"service_type": "LTE",
"ssid": None,
"summary": "connected",
"txchannel": "19667",
"type": "mdm",
"uid": "f5a8da8f",
"updated_at": "2018-08-17T11:19:57.019938+00:00",
"uptime": 86412.8558200002,
},
{
"id": 5674657356
"account": "https: //www.****.com/api/v2/accounts/38021/",
"bsid": None,
"carrier": "BOB",
"carrier_id": "BOB BOB",
"channel": None,
"connection_state": "connected",
"gsn": "356853050758871",
"homecarrid": "BOB",
"hostname": "BOB10765",
"is_asset": True,
"is_gps_supported": True,
"is_upgrade_available": False,
"is_upgrade_supported": True,
"ltebandwidth": "20 MHz",
"mac": None,
"serial": "356853050758871",
"service_type": "LTE",
"ssid": None,
"summary": "connected",
"txchannel": "19667",
"type": "mdm",
"uid": "f5a8da8f",
"updated_at": "2018-08-17T11:19:57.019938+00:00",
"uptime": 86412.8558200002,
},
{
"id": 5674657465
"account": "https: //www.****.com/api/v2/accounts/38021/",
"bsid": None,
"carrier": "BOB",
"carrier_id": "BOB BOB",
"channel": None,
"connection_state": "connected",
"gsn": "356853050758871",
"homecarrid": "BOB",
"hostname": "BOB10453453",
"is_asset": True,
"is_gps_supported": True,
"is_upgrade_available": False,
"is_upgrade_supported": True,
"ltebandwidth": "20 MHz",
"mac": None,
"serial": "356853050758871",
"service_type": "LTE",
"ssid": None,
"summary": "connected",
"txchannel": "19667",
"type": "mdm",
"uid": "f5a8da8f",
"updated_at": "2018-08-17T11:19:57.019938+00:00",
"uptime": 86412.8558200002,
},
{
"id": 9756756756
"account": "https: //www.****.com/api/v2/accounts/38021/",
"bsid": None,
"carrier": "BOB",
"carrier_id": "BOB BOB",
"channel": None,
"connection_state": "connected",
"gsn": "356853050758871",
"homecarrid": "BOB",
"hostname": "BOB100133",
"is_asset": True,
"is_gps_supported": True,
"is_upgrade_available": False,
"is_upgrade_supported": True,
"ltebandwidth": "20 MHz",
"mac": None,
"serial": "356853050758871",
"service_type": "LTE",
"ssid": None,
"summary": "connected",
"txchannel": "19667",
"type": "mdm",
"uid": "f5a8da8f",
"updated_at": "2018-08-17T11:19:57.019938+00:00",
"uptime": 86412.8558200002,
},
]
Upvotes: 1
Views: 1606
Reputation: 49
You can do it by deleting those columns from the data frame by the column header.
Example: p_dev.drop(['Column_1','Column_2','column_2'], axis = 1, inplace = True)
Another Way to do it to write only those column headers in a list which are needed and then overwrite the existing dataframe
Example:
col_list = ['Column_1', 'Column_3']
p_def=p_def[col_list]
Upvotes: 0
Reputation: 863301
Use list comprehension
with dict comprehension
for filtering by colums names:
L
is list of input data
device_data = [{k: v for k, v in x.items() if k in ['type','id']} for x in L]
print (device_data)
[{'id': 474378238, 'type': 'mdm'}, {'id': 5674657356, 'type': 'mdm'},
{'id': 5674657465, 'type': 'mdm'}, {'id': 9756756756, 'type': 'mdm'}]
df = pd.DataFrame(device_data)
print (df)
id type
0 474378238 mdm
1 5674657356 mdm
2 5674657465 mdm
3 9756756756 mdm
Upvotes: 2