Reputation: 6174
When developing C++ code, I often find myself trying to do something for all the data members belonging to a class. Classic examples are in the copy constructor and assignment operator. Another place is when implementing serialization functions. In all of these situations, I have spent a lot of time tracking down bugs in large codebases where someone has added a data member to a class but has not added its usage to one of these functions where it is needed.
With C++11, 14, 17, and 20, there are many template programming techniques that are quite sophisiticated in what they can do. Unfortunately, I understand only a little of template metaprogramming. I am hoping that someone can point me to a way to specify a list of variables (as a class member and/or type) that I can use to help reduce errors where someone has inadvertently left a member out. I am okay with both a compile-time and run-time penalty, as long as there is an easy way at build time to specify whether or not to use such instrumentation.
A notional usage might look like:
class Widget {
template <typename Archive> void serialize(Archive ar) {
auto myvl = vl(); // make a new list from the one defined in the constructor
ar(a);
ar(x);
myvl.pop(a);
myvl.pop(x);
// a run-time check that would be violated because s is still in myvl.
if (!myvl.empty())
throw std::string{"ill-definied serialize method; an expected variable was not used"};
// even better would be a compile-time check
}
private:
int a;
double x;
std::string s;
VariableList vl(a, x, s);
};
Or perhaps some static analysis, or ...
I am just looking for a way to improve the quality of my code. Thanks for any help.
Upvotes: 6
Views: 921
Reputation: 16747
PART 1 of 2 (see part 2 below)
I decided to make a special tool that uses CLang's AST tree.
As you're working on Windows, I wrote next instructions for Windows.
CLang library (SDK) as I found is very Linux oriented, it is difficult to use it straight away from sources on Windows. That's why I decided to use binary distribution of CLang to solve your task.
LLVM for Windows can be downloaded from github releases page, particularly current release is 11.0.1. To use it on windows you have to download LLVM-11.0.1-win64.exe. Install it to some folder, in my example I installed it into C:/bin/llvm/
.
Also Visual Studio has its own CLang packaged inside, it also can be used, but it is a bit outdated, so maybe very new C++20 features are not supported.
Find clang++.exe
in your LLVM installation, for my case it is C:/bin/llvm/bin/clang++.exe
, this path is used in my script as c_clang
variable in the beginning of script.
I used Python to write parsing tool, as this is well known and popular scripting language. I used my script to parse console output of CLang AST dump. You can install Python by download it from here.
Also AST tree can be parsed and processed at C++ level using CLang's SDK, example of AST Visitor implementation is located here, but this SDK can be probably used well only on Windows. That's why I chosen to use binary Windows distribution and parsing of console output. Binary distribution under Linux can also be used with my script.
You may try my script online on Linux server by clicking Try it online!
link below.
Script can be run using python script.py prog.cpp
, this will produce output prog.cpp.json
with parsed tree of namespaces and classes.
As a base script uses command clang++ -cc1 -ast-dump prog.cpp
to parse .cpp file into AST. You may try running command manually to see what it outputs, for example part of example output looks like this:
..................
|-CXXRecordDecl 0x25293912570 <line:10:13, line:13:13> line:10:19 class P definition
| |-DefinitionData pass_in_registers standard_layout trivially_copyable trivial literal
| | |-DefaultConstructor exists trivial needs_implicit
| | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| | |-MoveConstructor exists simple trivial needs_implicit
| | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
| | |-MoveAssignment exists simple trivial needs_implicit
| | `-Destructor simple irrelevant trivial needs_implicit
| |-CXXRecordDecl 0x25293912690 <col:13, col:19> col:19 implicit class P
| |-FieldDecl 0x25293912738 <line:11:17, col:30> col:30 x 'const char *'
| `-FieldDecl 0x252939127a0 <line:12:17, col:22> col:22 y 'bool'
..............
I parse this output to produce JSON output file. JSON file will look like this (part of file):
.............
{
"node": "NamespaceDecl",
"name": "ns2",
"loc": "line:3:5, line:18:5",
"tree": [
{
"node": "CXXRecordDecl",
"type": "struct",
"name": "R",
"loc": "line:4:9, line:6:9",
"tree": [
{
"node": "FieldDecl",
"type": "bool *",
"name": "pb",
"loc": "line:5:13, col:20"
}
]
},
.............
As you can see JSON file has next fields: node
tells CLang's name of node, it can be NamespaceDecl
for namespace, CXXRecordDecl
for struct/class/union, FieldDecl
for fields of struct (members). I hope you can easily find opensource JSON C++ parsers if you need, because JSON is the most simple format for storing structured data.
Also in JSON there are field name
with name of namespace/class/field, type
with type of class or field, loc
that says location inside file of namespace/class/field definition, tree
having a list of child nodes (for namespace node children are other namespaces or classes, for class node children are fields or other inlined classes).
Also my program prints to console simplified form, just list of classes (with full qualified name including namespaces) plus list of fields. For my example input .cpp it prints:
ns1::ns2::R - pb
ns1::ns2::S::P - x y
ns1::ns2::S::Q - r
ns1::ns2::S - i j b
Example input .cpp used:
// Start
namespace ns1 {
namespace ns2 {
struct R {
bool * pb;
};
struct S {
int i, j;
bool b;
class P {
char const * x;
bool y;
};
class Q {
R r;
};
};
}
}
int main() {
}
I also tested my script on quite complex .cpp having thousands of lines and dozens of classes.
You can use my script next way - after your C++ project is ready you run my script on your .cpp files. Using script output you can figure out what classes you have and what fields each class has. Then you can check somehow if this list of fields is same as your serialization code has, you can write simple macros for doing auto-checking. I think getting list of fields is the main feature that is needed for you. Running my script can be some preprocessing stage before compilation.
If you don't know Python and want to suggest me any improvements to my code, tell me, I'll update my code!
import subprocess, re, os, sys, json, copy, tempfile, secrets
c_file = ''
c_clang = 'C:/bin/llvm/bin/clang++.exe'
def get_ast(fname, *, enc = 'utf-8', opts = [], preprocessed = False, ignore_clang_errors = True):
try:
if not preprocessed:
fnameo = fname
r = subprocess.run([c_clang, '-cc1', '-ast-dump'] + opts + [fnameo], capture_output = True)
assert r.returncode == 0
else:
with tempfile.TemporaryDirectory() as td:
tds = str(td)
fnameo = tds + '/' + secrets.token_hex(8).upper()
r = subprocess.run([c_clang, '-E'] + opts + [f'-o', fnameo, fname], capture_output = True)
assert r.returncode == 0
r = subprocess.run([c_clang, '-cc1', '-ast-dump', fnameo], capture_output = True)
assert r.returncode == 0
except:
if not ignore_clang_errors:
#sys.stdout.write(r.stdout.decode(enc)); sys.stdout.flush()
sys.stderr.write(r.stderr.decode(enc)); sys.stderr.flush()
raise
pass
return r.stdout.decode(enc), fnameo
def proc_file(fpath, fout = None, *, clang_opts = [], preprocessed = False, ignore_clang_errors = True):
def set_tree(tree, path, **value):
assert len(path) > 0
if len(tree) <= path[0][0]:
tree.extend([{} for i in range(path[0][0] - len(tree) + 1)])
if 'node' not in tree[path[0][0]]:
tree[path[0][0]]['node'] = path[0][1]
if 'tree' not in tree[path[0][0]] and len(path) > 1:
tree[path[0][0]]['tree'] = []
if len(path) > 1:
set_tree(tree[path[0][0]]['tree'], path[1:], **value)
elif len(path) == 1:
tree[path[0][0]].update(value)
def clean_tree(tree):
if type(tree) is list:
for i in range(len(tree) - 1, -1, -1):
if tree[i] == {}:
tree[:] = tree[:i] + tree[i+1:]
for e in tree:
clean_tree(e)
elif 'tree' in tree:
clean_tree(tree['tree'])
def flat_tree(tree, name = (), fields = ()):
for e in tree:
if e['node'] == 'NamespaceDecl':
if 'tree' in e:
flat_tree(e['tree'], name + (e['name'],), ())
elif e['node'] == 'CXXRecordDecl':
if 'tree' in e:
flat_tree(e['tree'], name + (e['name'],), ())
elif e['node'] == 'FieldDecl':
fields = fields + (e['name'],)
assert 'tree' not in e['node']
elif 'tree' in e:
flat_tree(e['tree'], name, ())
if len(fields) > 0:
print('::'.join(name), ' - ', ' '.join(fields), sep = '')
ast, fpath = get_ast(fpath, opts = clang_opts, preprocessed = preprocessed, ignore_clang_errors = ignore_clang_errors)
fname = os.path.basename(fpath)
ipath, path, tree = [],(), []
st = lambda **value: set_tree(tree, path, **value)
inode, pindent = 0, None
for line in ast.splitlines():
debug = (path, line)
if not line.strip():
continue
m = re.fullmatch(r'^([|`\- ]*)(\S+)(?:\s+.*)?$', line)
assert m, debug
assert len(m.group(1)) % 2 == 0, debug
indent = len(m.group(1)) // 2
node = m.group(2)
debug = (node,) + debug
if indent >= len(path) - 1:
assert indent in [len(path), len(path) - 1], debug
while len(ipath) <= indent:
ipath += [-1]
ipath = ipath[:indent + 1]
ipath[indent] += 1
path = path[:indent] + ((ipath[indent], node),)
line_col, iline = None, None
m = re.fullmatch(r'^.*\<((?:(?:' + re.escape(fpath) + r'|line|col)\:\d+(?:\:\d+)?(?:\, )?){1,2})\>.*$', line)
if m: #re.fullmatch(r'^.*\<.*?\>.*$', line) and not 'invalid sloc' in line and '<<<' not in line:
assert m, debug
line_col = m.group(1).replace(fpath, 'line')
if False:
for e in line_col.split(', '):
if 'line' in e:
iline = int(e.split(':')[1])
if 'line' not in line_col:
assert iline is not None, debug
line_col = f'line:{iline}, ' + line_col
changed = False
if node == 'NamespaceDecl':
m = re.fullmatch(r'^.+?\s+?(\S+)\s*$', line)
assert m, debug
st(name = m.group(1))
changed = True
elif node == 'CXXRecordDecl' and line.rstrip().endswith(' definition') and ' implicit ' not in line:
m = re.fullmatch(r'^.+?\s+(union|struct|class)\s+(?:(\S+)\s+)?definition\s*$', line)
assert m, debug
st(type = m.group(1), name = m.group(2))
changed = True
elif node == 'FieldDecl':
m = re.fullmatch(r'^.+?\s+(\S+?)\s+\'(.+?)\'\s*$', line)
assert m, debug
st(type = m.group(2), name = m.group(1))
changed = True
if changed and line_col is not None:
st(loc = line_col)
clean_tree(tree)
if fout is None:
fout = fpath + '.json'
assert fout.endswith('.json'), fout
with open(fout, 'wb') as f:
f.write(json.dumps(tree, indent = 4).encode('utf-8'))
flat_tree(tree)
if __name__ == '__main__':
if c_file:
proc_file(c_file)
else:
assert len(sys.argv) > 1
proc_file(sys.argv[1])
Input:
// Start
namespace ns1 {
namespace ns2 {
struct R {
bool * pb;
};
struct S {
int i, j;
bool b;
class P {
char const * x;
bool y;
};
class Q {
R r;
};
};
}
}
int main() {
}
Output:
ns1::ns2::R - pb
ns1::ns2::S::P - x y
ns1::ns2::S::Q - r
ns1::ns2::S - i j b
JSON output:
[
{
"node": "TranslationUnitDecl",
"tree": [
{
"node": "NamespaceDecl",
"name": "ns1",
"loc": "line:2:1, line:19:1",
"tree": [
{
"node": "NamespaceDecl",
"name": "ns2",
"loc": "line:3:5, line:18:5",
"tree": [
{
"node": "CXXRecordDecl",
"type": "struct",
"name": "R",
"loc": "line:4:9, line:6:9",
"tree": [
{
"node": "FieldDecl",
"type": "bool *",
"name": "pb",
"loc": "line:5:13, col:20"
}
]
},
{
"node": "CXXRecordDecl",
"type": "struct",
"name": "S",
"loc": "line:7:9, line:17:9",
"tree": [
{
"node": "FieldDecl",
"type": "int",
"name": "i",
"loc": "line:8:13, col:17"
},
{
"node": "FieldDecl",
"type": "int",
"name": "j",
"loc": "col:13, col:20"
},
{
"node": "FieldDecl",
"type": "bool",
"name": "b",
"loc": "line:9:13, col:18"
},
{
"node": "CXXRecordDecl",
"type": "class",
"name": "P",
"loc": "line:10:13, line:13:13",
"tree": [
{
"node": "FieldDecl",
"type": "const char *",
"name": "x",
"loc": "line:11:17, col:30"
},
{
"node": "FieldDecl",
"type": "bool",
"name": "y",
"loc": "line:12:17, col:22"
}
]
},
{
"node": "CXXRecordDecl",
"type": "class",
"name": "Q",
"loc": "line:14:13, line:16:13",
"tree": [
{
"node": "FieldDecl",
"type": "ns1::ns2::R",
"name": "r",
"loc": "line:15:17, col:19"
}
]
}
]
}
]
}
]
}
]
}
]
PART 2 of 2
Digging inside sources of CLang I just found out that there is a way to dump into JSON directly from CLang, by specifying -ast-dump=json
(read PART 1 above for clarification), so PART1 code is not very useful, PART2 code is a better solution. Full AST dumping command would be clang++ -cc1 -ast-dump=json prog.cpp
.
I just wrote simple Python script to extract simple information from JSON dump, almost same like in PART1. On each line it prints full qualified struct/class/union name (including namespaces), then space, then separated by |
list of fields, each field is field type then ;
then field name. First lines of script should be modified to correct path to clang++.exe
location (read PART1).
Code below that collects fields names and types for all classes can be easily implemented also in C++ if desired. And even used at runtime to provide different useful meta-information, for your case checking if all fields where serialized and in correct order. This code uses just JSON format parser which is available everywhere for all programming languages.
Next script can be run same like first one by python script.py prog.cpp
.
import subprocess, json, sys
c_file = ''
c_clang = 'C:/bin/llvm/bin/clang++.exe'
r = subprocess.run([c_clang, '-cc1', '-ast-dump=json', c_file or sys.argv[1]], check = False, capture_output = True)
text = r.stdout.decode('utf-8')
data = json.loads(text)
def flat_tree(tree, path = (), fields = ()):
is_rec = False
if 'kind' in tree:
if tree['kind'] == 'NamespaceDecl':
path = path + (tree['name'],)
elif tree['kind'] == 'CXXRecordDecl' and 'name' in tree:
path = path + (tree['name'],)
is_rec = True
if 'inner' in tree:
for e in tree['inner']:
if e.get('kind', None) == 'FieldDecl':
assert is_rec
fields = fields + ((e['name'], e.get('type', {}).get('qualType', '')),)
else:
flat_tree(e, path, ())
if len(fields) > 0:
print('::'.join(path), '|'.join([f'{e[1]};{e[0]}' for e in fields]))
flat_tree(data)
Output:
ns1::ns2::R bool *;pb
ns1::ns2::S::P const char *;x|bool;y
ns1::ns2::S::Q ns1::ns2::R;r
ns1::ns2::S int;i|int;j|bool;b
For input:
// Start
namespace ns1 {
namespace ns2 {
struct R {
bool * pb;
};
struct S {
int i, j;
bool b;
class P {
char const * x;
bool y;
};
class Q {
R r;
};
};
}
}
int main() {
}
CLang's AST JSON partial example output:
...............
{
"id":"0x1600853a388",
"kind":"CXXRecordDecl",
"loc":{
"offset":189,
"line":10,
"col":19,
"tokLen":1
},
"range":{
"begin":{
"offset":183,
"col":13,
"tokLen":5
},
"end":{
"offset":264,
"line":13,
"col":13,
"tokLen":1
}
},
"name":"P",
"tagUsed":"class",
"completeDefinition":true,
"definitionData":{
"canPassInRegisters":true,
"copyAssign":{
"hasConstParam":true,
"implicitHasConstParam":true,
"needsImplicit":true,
"trivial":true
},
"copyCtor":{
"hasConstParam":true,
"implicitHasConstParam":true,
"needsImplicit":true,
"simple":true,
"trivial":true
},
"defaultCtor":{
"exists":true,
"needsImplicit":true,
"trivial":true
},
"dtor":{
"irrelevant":true,
"needsImplicit":true,
"simple":true,
"trivial":true
},
"isLiteral":true,
"isStandardLayout":true,
"isTrivial":true,
"isTriviallyCopyable":true,
"moveAssign":{
"exists":true,
"needsImplicit":true,
"simple":true,
"trivial":true
},
"moveCtor":{
"exists":true,
"needsImplicit":true,
"simple":true,
"trivial":true
}
},
"inner":[
{
"id":"0x1600853a4a8",
"kind":"CXXRecordDecl",
"loc":{
"offset":189,
"line":10,
"col":19,
"tokLen":1
},
"range":{
"begin":{
"offset":183,
"col":13,
"tokLen":5
},
"end":{
"offset":189,
"col":19,
"tokLen":1
}
},
"isImplicit":true,
"name":"P",
"tagUsed":"class"
},
{
"id":"0x1600853a550",
"kind":"FieldDecl",
"loc":{
"offset":223,
"line":11,
"col":30,
"tokLen":1
},
"range":{
"begin":{
"offset":210,
"col":17,
"tokLen":4
},
"end":{
"offset":223,
"col":30,
"tokLen":1
}
},
"name":"x",
"type":{
"qualType":"const char *"
}
},
{
"id":"0x1600853a5b8",
"kind":"FieldDecl",
"loc":{
"offset":248,
"line":12,
"col":22,
"tokLen":1
},
"range":{
"begin":{
"offset":243,
"col":17,
"tokLen":4
},
"end":{
"offset":248,
"col":22,
"tokLen":1
}
},
"name":"y",
"type":{
"qualType":"bool"
}
}
]
},
...............
Upvotes: 1
Reputation: 1
Is there a way to specify and use a list of all data members belonging to a C++ class
Yes, if you use a recent GCC compiler (GCC 10 in start of 2021). Code your GCC plugin doing so.
See also the DECODER project and the Bismon software and this draft report.
I am just looking for a way to improve the quality of my code
Consider using tools like the Clang static analyzer or Frama-C++.
Upvotes: 0
Reputation: 42776
This is no way to do this without reflection support. The alternative way is to transform your customized struct
into the tuple
of your member reference then using std::apply
to operate the elements of the tuple
one by one. You can see CppCon 2016: "C++14 Reflections Without Macros, Markup nor External Tooling" for the details. Here are the concepts:
First, we need to detect your customized struct
's fields count:
template <auto I>
struct any_type {
template <class T> constexpr operator T& () const noexcept;
template <class T> constexpr operator T&&() const noexcept;
};
template <class T, auto... Is>
constexpr auto detect_fields_count(std::index_sequence<Is...>) noexcept {
if constexpr (requires { T{any_type<Is>{}...}; }) return sizeof...(Is);
else
return detect_fields_count<T>(std::make_index_sequence<sizeof...(Is) - 1>{});
}
template <class T>
constexpr auto fields_count() noexcept {
return detect_fields_count<T>(std::make_index_sequence<sizeof(T)>{});
}
Then we can transform your struct
into tuple
according to the fields_count
traits (to illustrate, I only support the fields_count
up to 8):
template <class S>
constexpr auto to_tuple(S& s) noexcept {
if constexpr (constexpr auto count = fields_count<S>(); count == 8) {
auto& [f0, f1, f2, f3, f4, f5, f6, f7] = s;
return std::tie(f0, f1, f2, f3, f4, f5, f6, f7);
} else if constexpr (count == 7) {
auto& [f0, f1, f2, f3, f4, f5, f6] = s;
return std::tie(f0, f1, f2, f3, f4, f5, f6);
} else if constexpr (count == 6) {
auto& [f0, f1, f2, f3, f4, f5] = s;
return std::tie(f0, f1, f2, f3, f4, f5);
} else if constexpr (count == 5) {
auto& [f0, f1, f2, f3, f4] = s;
return std::tie(f0, f1, f2, f3, f4);
} else if constexpr (count == 4) {
auto& [f0, f1, f2, f3] = s;
return std::tie(f0, f1, f2, f3);
} else if constexpr (count == 3) {
auto& [f0, f1, f2] = s;
return std::tie(f0, f1, f2);
} else if constexpr (count == 2) {
auto& [f0, f1] = s;
return std::tie(f0, f1);
} else if constexpr (count == 1) {
auto& [f0] = s;
return std::tie(f0);
} else if constexpr (count == 0) {
return std::tie();
}
}
Then you can use this utility in your own serialize
functions:
struct Widget {
template <typename Archive>
void serialize(Archive ar) {
std::apply([ar](auto&... x) { (ar(x), ...); }, to_tuple(*this));
}
};
See godbolt for the live demo.
Upvotes: 3
Reputation: 275585
This feature is coming with the (compile time) reflection feature. https://root.cern/blog/the-status-of-reflection/ talks about its status at a technical level last year.
Reflection is a c++23 priority, and is likely to be there.
Before that, one approach I do is write a single point of failure for all such operations. I call it as_tie
:
struct Foo {
int x,y;
template<class Self, std::enable_if_t<std::is_same_v<Foo, std::decay_t<Self>>, bool> =true>
friend auto as_tie(Self&& self){
static_assert(sizeof(self)==8);
return std::forward_as_tuple( decltype(self)(self).x, decltype(self)(self).y );
}
friend bool operator==(Foo const&lhs, Foo const& rhs){
return as_tie(lhs)==as_tie(rhs);
}
};
or somesuch depending on dialect.
Then your seializer/deserializer/etc can use as_tie
, maybe using foreach_tuple_element
. Versioning can even be done; as_tie_v2_2_0
for an obsolete tie.
And if someone adds a member, the sizeof static assert probably fires.
Upvotes: 3