In case of syntactic ambiguity with previous versions of the grammar, such as
"with (expr)" or "with (expr1, expr2)", PyWithStatement is still parsed as
having its own parentheses, not a parenthesized expression or a tuple as
a single context expression. The latter case, even though syntactically legal,
is still reported by the compatibility inspection in Python <3.9.
These changes also include proper formatter and editing support (e.g. not
inserting backslashes on line breaks inside parentheses), as well as
Complete Current Statement, which now takes possible parentheses into account
while inserting a missing colon.
The changes in the formatter are somewhat ad-hoc, intended to minimize the effect
on other constructs. "With" statement is somewhat special in the sense that it's
the first compound statement (having a statement list) with its own list-like
part in parentheses.
Existing tests on with statement processing were expanded and uniformly named.
Co-authored-by: Semyon Proshev <semyon.proshev@jetbrains.com>
GitOrigin-RevId: 15c33e97f177e81b5ed23891063555df016feb05
Not stopping at a statement break token and continuing recovery until a colon,
we considered the subsequent well-formed case clause to be a part of an error
message about a missing pattern, thus, moving the caret to its colon.
GitOrigin-RevId: f4ee0e12876960e989de3dee89925b65e3cf2339
Namely, for their case clauses and inner comments.
Right now, each of them is indented on its own, as a separate formatting block.
It's still not entirely clear whether we should have a dedicated indented
container element for case clauses, similar to PyStatementList for statements.
It might simplify the formatter and some editing actions, but cause confusion
between the two container elements.
GitOrigin-RevId: 69184d2f8f78e2e113e8f40a310bb13ac0b5e71a
All the corresponding PSI elements now have empty interfaces. The API will be
"beefed up" as we start adding actual processing of them in the code insight
(e.g. for the upcoming CFG and inspections).
The trickiest part of the parsing was the recovery. Patterns allow only
a limited subset of expression syntax, but I tried to sensibly consume and report
everything else (not building PSI for it). So that if user starts typing more
general expressions in the midst of a pattern, we still give meaningful error
messages. It seems a likely cause of errors when the feature first rolls out in
Python 3.10.
GitOrigin-RevId: fae40034964e4a25d91dab06a43d3fc07225d9c7
We assumed that if the first token of a Python file is a string literal, then it's a docstring. It's not the case for the Python console, where each input is a separate "file".
Now we pass the `PythonLexerKind` to the `PythonLexer` so that we can parse string literals differently if we are in the console "file".
I've also added Cython as a lexical kind for the Python lexer, since there is at least one place in the lexical rules that is specific to Cython. Also we have separate lexer subclasses for Cython, so having it expressed as a kind for the JFlex rules seems logical, even if we don't use it right now.
GitOrigin-RevId: f5e34fa2dc3b3da84cacf6cee69a4ba0ee674ad5
It not only makes AST of f-string nodes simpler and more obvious to work with,
but, in general, is also better supported by various platform functionality that
assumes that raw text parts of string literals are not broken into multiple
elements.
GitOrigin-RevId: 931d1ea4c09d145e763aed839dcb4acbb3e43ec7
Every expression on top-level inside f-string that could be considered as an assignment expression is actually a format expression.
GitOrigin-RevId: a375543c80d549d5c08166f33b401206ab31f8b8
Highlight expressions that are invalid by grammar or unacceptable in runtime
Highlight expressions in pythons < 3.8
GitOrigin-RevId: 89acec9db5b3a931da31c33778185e147240ec01
so as not to cause confusion with ASTNode. Also additionally rename
PyLiteralStringElement to PyPlainStringElement, again, not to confuse
users with subtle "string literal" vs. "literal string" differences
in meaning.
There is actually no need to match them together with braces and then return
the second character back to the stream. The longest match rule will ensure
that all allowed escape sequences will be recognised as one unit, so it seems
safe to match backslash individually for all remaining cases.
Made the corresponding states of the lexer exclusive so that the pattern
for line comments doesn't match literal fragments starting with a hash sign
as the longest (together with closing braces and quotes). It also means that
we need to replicate the BAD_CHARACTER rule in these states as a fallback
pattern for unmatched input.
It's beneficial for two reasons: it allows us to avoid occasional
PsiWhitespaces at the end of incomplete fragments and also we
can now better report illegal line comments this way, because even though
they still terminate f-strings they are now under the corresponding AST
nodes for expression fragments and can be processed by the annotator
in the same fashion.