ksandarusi
ksandarusi

Reputation: 150

How do I determine what a tab character is when parsing a file?

I am opening a file (in perl) and I was wondering how do I determine what a tab character looks like.

I know they are in my file, but I was wondering how I can tell what it is. I know that for output to a file you would use \t, but its not the same for reading a file.

I also know that it reads it as some sort of TAB character because I printed out a line char by char on every line and could easily see the TABed lines.

Upvotes: 0

Views: 1684

Answers (2)

Jarmund
Jarmund

Reputation: 3205

It's common for some IDEs and editors to insert four spaces instead of a tab character if you hit the tab key. The actual tab character is \t in perl (the contents depend on the platform, but the \t should always represent the tab character for your platform)

To make sure you catch both the tab character, and any groups of 4 spaces, you could regex for /\t| {4}/

Upvotes: 0

mvp
mvp

Reputation: 116107

Tab character is always \t, there is nothing more to say about it.

However, some editors use conventions about how many spaces single tab character should represent. Common wisdom says 8, but often people mean 4, and I have seen it to mean 3 and even 2 spaces.

Some editors (like Komodo or Komodo Edit) try to be smart: they read source file and count typical distribution of leading spaces and tabs. For example, if only 4,8,12,... leading spaces can be seen, it may implicitly assume that your tab character should mean 4 spaces. Or, if 2,4,6,... leading spaces are observed, it may use 2 spaces per tab.

If I understood you correctly, you want similar behavior for leading spaces.

In this case, you can determine most likely tab to space value using code below. Note that this code is not optimal: it would ignore lines with actual tabs, it only considers first indentation level to get tab indent and so on. Consider this only as starting point to get good implementation:

my %dist;
while (my $line = <>) {
    my ($spaces) = ($line =~ /(^ *)/);
    my $len = length($spaces);
    $dist{$len}++;
}
my @sp = sort {$a <=> $b} keys %dist;
print "Leading space distribution in file: "
    . join(",", @sp) . "\n";
if (scalar @sp >= 2) {
    print "Most likely tab setting is: ", $sp[1] - $sp[0];
}

Upvotes: 2

Related Questions