Reputation: 23759
Here is the MWE:
#!/usr/bin/perl
use utf8;
use strict;
use warnings;
use Net::IMAP::Client;
use Encode qw/decode/;
use open ':std', ':encoding(UTF-8)';
my $user = '[email protected]';
my $pwd = 'secret';
my $imap = Net::IMAP::Client->new(
server => 'imap.gmail.com',
user => $user,
pass => $pwd,
ssl => 1, # (use SSL? default no)
ssl_verify_peer => 1, # (use ca to verify server, default yes)
port => 993
) or die "Could not connect to IMAP server: $!";
$imap->login or die('Login failed: ' . $imap->last_error);
# all the incoices from my telephone company
$imap->select('INBOX');
my $messages = $imap->search({
from => '[email protected]',
#subject => '2020',
});
unless(defined($messages))
{
$imap->logout();
die "no messages";
}
foreach my $id (@$messages)
{
my $summary = $imap->get_summaries([$id])->[0];
my $subject = $summary->subject;
$subject = decode('MIME-Header', $subject);
print $subject."\n";
}
This outputs all the invoices from [email protected]
in the mailbox:
Счёт за услуги ПАО МГТС за Июнь 2017 г.
Счёт за услуги ПАО МГТС за Июль 2017 г.
Счёт за услуги ПАО МГТС за Август 2017 г.
Счёт за услуги ПАО МГТС за Ноябрь 2017 г.
Счёт за услуги ПАО МГТС за Декабрь 2017 г.
Счёт за услуги ПАО МГТС за Ноябрь 2018 г.
Счёт за услуги ПАО МГТС за Декабрь 2018 г.
Счёт за услуги ПАО МГТС за Декабрь 2019 г.
Счёт за услуги ПАО МГТС за Март 2020 г.
Счёт за услуги ПАО МГТС за Апрель 2020 г.
Everything is correct.
Now I add the new condition, uncommenting:
#subject => '2020',
And I get all the invoices from 2020:
Счёт за услуги ПАО МГТС за Март 2020 г.
Счёт за услуги ПАО МГТС за Апрель 2020 г.
But when I add the word "Апрель" (April) to the search:
subject => 'Апрель 2020',
I get "no messages", despite the fact, that this substring exists in the subject of 1 email in the mailbox.
Subject in the email looks like this:
Subject: =?utf-8?Q?=D0=A1=D1=87=D1=91=D1=82=20?=
What's wrong and how to fix?
Upvotes: 3
Views: 810
Reputation: 3653
It seems like Net::IMAP::Client
doesn't support IMAP4rev1 that covers UTF-8 encoded strings.
According to Gmail documentation, you can use X-GM-RAW attribute to fetch the same result of Gmail web interface.
If I must use Net::IMAP::Client
, I would add a new method to perform X-GM-RAW
:
#!/usr/bin/perl
use utf8;
use strict;
use warnings;
use Net::IMAP::Client;
use Encode qw/decode/;
use IO::Socket qw(:crlf);
use open ':std', ':encoding(UTF-8)';
my $user = '[email protected]';
my $pwd = 'secret';
my $imap = Net::IMAP::Client->new(
server => 'imap.gmail.com',
user => $user,
pass => $pwd,
ssl => 1, # (use SSL? default no)
ssl_verify_peer => 1, # (use ca to verify server, default yes)
port => 993
) or die "Could not connect to IMAP server: $!";
$imap->login or die('Login failed: ' . $imap->last_error);
# Add search_gmail method to Net::IMAP::Client
sub Net::IMAP::Client::search_gmail {
my ($self, $criteria) = @_;
my @crit;
for my $key (keys %{$criteria}) {
push @crit, join ":", $key, $criteria->{$key};
}
my $crit_str = join q{ }, @crit;
my ($ok, $lines);
($ok, $lines) = $self->_tell_imap('SEARCH' => "CHARSET UTF-8 X-GM-RAW " . do {
use bytes;
sprintf qq{{%d}%s%s}, length($crit_str), $CRLF, $crit_str;
});
return unless $ok;
for my $line (@{$lines->[1]}) {
if ($line =~ s/^\*\s+SEARCH\s+//ig) {
$line =~ s/\s*$//g;
return [ map { $_ + 0 } split(/\s+/, $line) ];
}
}
}
# all the incoices from my telephone company
$imap->select('INBOX');
my $messages = $imap->search_gmail({
from => '[email protected]',
#subject => '2020',
});
unless(defined($messages))
{
$imap->logout();
die "no messages";
}
foreach my $id (@$messages)
{
my $summary = $imap->get_summaries([$id])->[0];
my $subject = $summary->subject;
$subject = decode('MIME-Header', $subject);
print $subject."\n";
}
Upvotes: 4
Reputation: 185851
What I would start treating Unicode:
use strict; use warnings;
use utf8;
binmode $_, ":utf8" for qw/STDOUT STDIN STDERR/;
With recent Perl:
use feature 'unicode_strings';
https://perldoc.perl.org/perlunicode.html#Important-Caveats
Upvotes: -1
Reputation: 40778
(See at the bottom of this post for a Python example that seems to work)
I tried another module Net::IMAP::Simple::Gmail since it has an option for debug
output (I first sent an email to myself with subject Апрель 2020
such that I could test the behavior easily):
use feature qw(say);
use strict;
use warnings;
use utf8;
use Net::IMAP::Simple::Gmail;
use Encode qw(encode_utf8);
my $server = 'imap.gmail.com';
my $imap = Net::IMAP::Simple::Gmail->new($server, debug => 1);
my $user = '[email protected]';
my $pass = 'mypass';
if(!$imap->login($user,$pass)){
die "Login failed: " . $imap->errstr . "\n";
}
my $num_messages = $imap->select('INBOX') or die $imap->errstr;
my @ids = $imap->search(encode_utf8('SUBJECT "Апрель 2020"'));
say "Found ", (scalar @ids), " messages";
Output:
[...l/5.30.0/Net/IMAP/Simple.pm line 133 in sub _connect] connecting to imap.gmail.com:993
[...l/5.30.0/Net/IMAP/Simple.pm line 133 in sub _connect] connected, returning socket
[./p.pl line 11 in sub new] waiting for socket ready
[./p.pl line 11 in sub new] looking for greeting
[./p.pl line 11 in sub new] got a greeting: * OK Gimap ready for requests from 51.174.5.83 u18mb43700719ljl\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1252 in sub _send_cmd] 0 LOGIN [email protected] "mypass"\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 265 in sub _process_cmd] * CAPABILITY IMAP4rev1 UNSELECT IDLE NAMESPACE QUOTA ID XLIST CHILDREN X-GM-EXT-1 UIDPLUS COMPRESS=DEFLATE ENABLE MOVE CONDSTORE ESEARCH UTF8=ACCEPT LIST-EXTENDED LIST-STATUS LITERAL- SPECIAL-USE APPENDLIMIT=35651584\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * CAPABILITY IMAP4rev1 UNSELECT IDLE NAMESPACE QUOTA ID XLIST CHILDREN X-GM-EXT-1 UIDPLUS COMPRESS=DEFLATE ENABLE MOVE CONDSTORE ESEARCH UTF8=ACCEPT LIST-EXTENDED LIST-STATUS LITERAL- SPECIAL-USE APPENDLIMIT=35651584\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 265 in sub _process_cmd] 0 OK [email protected] authenticated (Success)\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] 0 OK [email protected] authenticated (Success)\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1252 in sub _send_cmd] 1 SELECT "INBOX"\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] * FLAGS (\Answered \Flagged \Draft \Deleted \Seen $MailFlagBit0 $MailFlagBit1 $NotJunk $NotPhishing $Phishing NotJunk)\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * FLAGS (\Answered \Flagged \Draft \Deleted \Seen $MailFlagBit0 $MailFlagBit1 $NotJunk $NotPhishing $Phishing NotJunk)\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] * OK [PERMANENTFLAGS (\Answered \Flagged \Draft \Deleted \Seen $MailFlagBit0 $MailFlagBit1 $NotJunk $NotPhishing $Phishing NotJunk \*)] Flags permitted.\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * OK [PERMANENTFLAGS (\Answered \Flagged \Draft \Deleted \Seen $MailFlagBit0 $MailFlagBit1 $NotJunk $NotPhishing $Phishing NotJunk \*)] Flags permitted.\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] * OK [UIDVALIDITY 638142060] UIDs valid.\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * OK [UIDVALIDITY 638142060] UIDs valid.\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] * 27869 EXISTS\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * 27869 EXISTS\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] * 0 RECENT\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * 0 RECENT\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] * OK [UIDNEXT 32724] Predicted next UID.\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * OK [UIDNEXT 32724] Predicted next UID.\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] * OK [HIGHESTMODSEQ 4375397]\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] * OK [HIGHESTMODSEQ 4375397]\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 413 in sub _process_cmd] 1 OK [READ-WRITE] INBOX selected. (Success)\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] 1 OK [READ-WRITE] INBOX selected. (Success)\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1252 in sub _send_cmd] 2 SEARCH SUBJECT "Апрель 2020"\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 615 in sub _process_cmd] 2 BAD Could not parse command\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] 2 BAD Could not parse command\r\n
[...l/5.30.0/Net/IMAP/Simple.pm line 1192 in sub _seterrstr] Could not parse command\r
Found 0 messages
Notice the output:
[...l/5.30.0/Net/IMAP/Simple.pm line 1277 in sub _cmd_ok] 2 BAD Could not parse command\r\n
This is the socket returning 2 BAD Could not parse command
at line 1260
Similarly, using the Python IMAPClient module:
user = '[email protected]';
passw = "mypass";
from imapclient import IMAPClient
server = IMAPClient('imap.gmail.com', use_uid=True, ssl=True)
result = server.login(user, passw)
print(result)
select_info = server.select_folder('INBOX')
print(select_info)
messages = server.search(['SUBJECT', 'Апрель 2020'.encode('utf8')])
print(messages)
Gives output:
b'[email protected] authenticated (Success)'
{b'PERMANENTFLAGS': (b'\\Answered', b'\\Flagged', b'\\Draft', b'\\Deleted', b'\\Seen', b'$MailFlagBit0', b'$MailFlagBit1', b'$NotJunk', b'$NotPhishing', b'$Phishing', b'NotJunk', b'\\*'), b'FLAGS': (b'\\Answered', b'\\Flagged', b'\\Draft', b'\\Deleted', b'\\Seen', b'$MailFlagBit0', b'$MailFlagBit1', b'$NotJunk', b'$NotPhishing', b'$Phishing', b'NotJunk'), b'UIDVALIDITY': 638142060, b'EXISTS': 27869, b'RECENT': 0, b'UIDNEXT': 32724, b'HIGHESTMODSEQ': 4375417, b'READ-WRITE': True}
Traceback (most recent call last):
File "/home/hakon/.pyenv/versions/3.7.3/lib/python3.7/site-packages/imapclient/imapclient.py", line 982, in _search
data = self._raw_command_untagged(b'SEARCH', args)
File "/home/hakon/.pyenv/versions/3.7.3/lib/python3.7/site-packages/imapclient/imapclient.py", line 1445, in _raw_command_untagged
typ, data = self._raw_command(command, args, uid=uid)
File "/home/hakon/.pyenv/versions/3.7.3/lib/python3.7/site-packages/imapclient/imapclient.py", line 1507, in _raw_command
return self._imap._command_complete(to_unicode(command), tag)
File "/home/hakon/.pyenv/versions/3.7.3/lib/python3.7/imaplib.py", line 1027, in _command_complete
raise self.error('%s command error: %s %s' % (name, typ, data))
imaplib.error: SEARCH command error: BAD [b'Could not parse command']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./t.py", line 11, in <module>
messages = server.search(['SUBJECT', 'Апрель 2020'.encode('utf8')])
File "/home/hakon/.pyenv/versions/3.7.3/lib/python3.7/site-packages/imapclient/imapclient.py", line 956, in search
return self._search(criteria, charset)
File "/home/hakon/.pyenv/versions/3.7.3/lib/python3.7/site-packages/imapclient/imapclient.py", line 995, in _search
criteria='"%s"' % criteria if not isinstance(criteria, list) else criteria
imapclient.exceptions.InvalidCriteriaError: b'Could not parse command'
This error may have been caused by a syntax error in the criteria: ['SUBJECT', b'\xd0\x90\xd0\xbf\xd1\x80\xd0\xb5\xd0\xbb\xd1\x8c 2020']
Please refer to the documentation for more information about search criteria syntax..
https://imapclient.readthedocs.io/en/master/#imapclient.IMAPClient.search
Finally I found a Python example that actually seems to work:
import imaplib
user = "[email protected]"
passw = "mypass";
sock = imaplib.IMAP4_SSL("imap.gmail.com", 993)
sock.login(user, passw)
sock.select()
sock.debug = 4
sock.literal = u"Апрель 2020".encode('utf8')
res = sock.uid('SEARCH', 'CHARSET', 'UTF-8', 'SUBJECT')
print(res)
Output:
04:44.46 > b'DMOM3 UID SEARCH CHARSET UTF-8 SUBJECT {17}'
04:44.52 < b'+ go ahead'
04:44.52 write literal size 17
04:44.67 < b'* SEARCH 32720'
04:44.67 < b'DMOM3 OK SEARCH completed (Success)'
('OK', [b'32720'])
Upvotes: 1