Reputation: 631
I am experimenting with PERF_EVENTS,a performance event interface provided by the Linux kernel. I was successfully in getting performances parameter(cpu cycles,...) through perf_event_open syscall.
long
perf_event_open(struct perf_event_attr *hw_event, pid_t pid,
int cpu, int group_fd, unsigned long flags)
{
int ret;
ret = syscall(__NR_perf_event_open, hw_event, pid, cpu,
group_fd, flags);
return ret;
}
int
main(int argc, char **argv)
{
struct perf_event_attr pe;
long long count;
int fd;
memset(&pe, 0, sizeof(struct perf_event_attr));
pe.type = PERF_TYPE_HARDWARE;
pe.size = sizeof(struct perf_event_attr);
pe.config = PERF_COUNT_HW_CPU_CYCLES;
pe.disabled = 1;
pe.exclude_idle = 1;
pe.exclude_kernel = 1;
pe.exclude_callchain_kernel = 1;
fd = perf_event_open(&pe, 0, -1, -1, 0);
if (fd == -1) {
fprintf(stderr, "Error opening leader %llx\n", pe.config);
exit(EXIT_FAILURE);
}
ioctl(fd, PERF_EVENT_IOC_RESET, 0);
ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
printf("Measuring instruction count for this printf\n");
ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
read(fd, &count, sizeof(long long));
printf("%lld \n", count);
return 0;
}
However, I don't understand fully the use of perf_event_open. I am passing blindly the -1 as the 4th parameter. I don't when to group events, when to separate them, which one of them should be the group "leader".
below is the documentation of the 4th parameter:
The group_fd argument allows event groups to be created. An event group has one event which is the group leader. The leader is created first, with group_fd = -1. The rest of the group members are created with subsequent perf_event_open() calls with group_fd being set to the fd of the group leader. (A single event on its own is created with group_fd = -1 and is considered to be a group with only 1 member.) An event group is scheduled onto the CPU as a unit: it will only be put onto the CPU if all of the events in the group can be put onto the CPU. This means that the values of the member events can be meaningfully compared, added, divided (to get ratios), etc., with each other, since they have counted events for the same set of executed instructions.
So can any one put some light on the 4th(and if possible it's relation with the 5th)? what is the proper way of doing things? also an example will make things much better.
Upvotes: 0
Views: 520
Reputation: 1512
I'm not certain about the flags, but I can give some colour on the group, though I don't know if this adequately answers your question rather than just rephrases the documentation you've quoted.
The CPU hardware is very limited^ and so access to the counters has to be shared. As such your resources are potentially unmapped and remapped as the OS decides from time to time who gets to use the underlying physical resources.
Some measurements you might make only make sense if you measure two counters at the same time: for example number of branches taken vs number of branches mispredicted.
In order to ensure your two counters are scheduled on and off the CPU by the operating system together, instead of being independently so, you need to make a group. One of them should be the leader, and then the other would use the first counter's fd as its leader.
Then you know that any counts you read are from times when both counters were enabled and running together.
^ Apart from a couple of common things like "cycles retired", most Intel CPUs only support measuring four event types at a time, from a palette of many hundreds.
Upvotes: 1