Language Support for Tuples in D

Incomplete support for tuples is often discussed in D community. I want to write a DIP on them. Here are some considerations.

Terminology

The tuple notion comes from math where it means a fixed sequence of related but maybe completely different entities. In dynamic languages tuples are fixed-length lists of values of arbitrary types. The closest analogy in c-family languages would be an anonymous struct. In D things are somewhat more complicated though.

In D, tuple is strictly a compile-time construct. It is a fixed-length list of types, compile-time expressions, or symbols, in any combination. There are two obvious and useful tuple sub-classes: type tuples consisting only of types, and expression tuples consisting only of expressions. These are mentioned in documentation and seem to be purely conventional.

But there’s more to that. If you have a type tuple you can declare a variable of that type. I’m not sure if this is documented anywhere. This variable will be perfectly runtime. It will have tuple type even though documentation asserts that tuples are not types. You’ll be able to change value of this variable, either as a whole by assigning a compatible expression tuple or another such variable to it, or by modifying individual components by indexing them. Still such a variable will display some tuple characteristics like flattening when passed as a function argument.

I will call such variables tuple variables, and values in them tuple values. They’re distinct from tuples as such. This is why I don’t like naming of std.typetuple.TypeTuple and std.typecons.Tuple: they’re both misnomers. TypeTuple is actually a generic tuple constructor which allows to create any tuple supported by compiler. The Tuple is an anonymous struct constructor which contains a tuple variable as an alternative means to access struct fields, but is otherwise not a tuple at all.

Thoughts

There are several things required so that the language feels like it supports tuples:

  • Sugar for type tuple construction
  • Sugar for tuple value construction: tuple literals
  • Parallel assignment

I agree with others in the community that reusing the comma operator for that would be perfect. I think it’s possible:

  • Let comma operator result be a tuple instead of an expression after the last comma
  • Allow types in place of expressions
  • Let semantic analyzer discard mixed tuple literals as invalid
  • Treat type tuples as regular types, and expression tuples as tuple value literals
  • Support C: when a tuple value is cast to a scalar type, replace tuple value with its last element

Caveats

I can see two for now:

  1. Weird tuple to scalar cast
  2. Tuple flattening may conflict with C compatibility:

    void foo(...);
    foo((a, b, c));

    This code means foo(c) in C but foo(a, b, c) in this proposal

Point 1 can be dealt with: either discarded, or such a conversion made illegal which I don’t think will hurt many. But point 2 may be a real show-stopper.

About these ads

9 Responses

  1. This proposal isn’t new at all. I’m not sure if the tuple flattening is desirable as a default behavior at all. It has been proposed that e.g. *(…) syntax would flatten the tuple, analog to how de-referencing a pointer works.

    Semantically a tuple type is like an anonymous struct. The type of the value tuple is a type tuple. It’s value is a expression tuple. Assignments should work like with other built-in types. Tuple members should be indexable. Nesting of tuples should be allowed, e.g. (int, (int, int)) is different from (int,int,int). It’s the same thing with type tuples actually. Nesting them seems very useful e.g. if you’re building expression templates.

    • I’m sure it’s not new. It’s that I’ve never seen a well thought-out proposal with all corner cases explained and solved. I may have just missed that. I’d surely appreciate a link.

      Flattening tuples is a feasible default for meta-programming where they come from. Basically for T to be seamlessly passable where T… is expected. I can’t remember ever wanting a non-flattening behavior in my templates. In expressions you want nesting most of the time though. The solution could be to introduce non-flattening tuples into compiler and to make the comma operator always produce those. Though this comes at a cost of additional language constructs and semantics to override flattening behavior.

  2. I’ve also been thinking about this problem, but from another angle. It’s been mentioned before, but in order to support matrices, etc. both multi-dimensional slicing and mixed indexing and slicing. (e.g. row5 = matrix[5,0..$]) But this means the current slicing syntax ( v[1..7] => v.opSlice(1,7) ) conflicts with indexing if they’re merged ( v[1,y] => v.opIndex(1,7) ). I’ve generally ended up using uint[2], etc in matrix classes I’ve written, but it’s ugly and inconsistent with arrays: m[5,[0,$]]. Value tuples would also work internally, but even with a short syntax they’d still be inconsistent. Unless that short syntax used the ‘..’ operator which merging slicing and indexing frees up.

    Using ‘..’ would avoid the C syntax conflicts and appears to be a little less verbose (and definitely easier to type) at first glance.

    foo((a, b, c));
    foo(a..b..c);
    

    I think it would be fairly straight forward for ‘a..b’ or ‘(a,b)’ to provide sugar for either tuple(a,b) or TypeTuple!(a,b) as appropriate, but I don’t know about implementing Multiple Value Return (MVR) style parallel assignment. (Tuples of aliases don’t seem to be supported)

    • Thanks for the insight. This is a different perspective indeed. I need to think about this a bit more. As to the parallel assignment, I have a couple of ideas. I’ll write about it in my next blog.

      • Well, you can do MVR today with alias template parameters, i.e.

        struct MVR(alias _x, alias _y) {
            void x(typeof(_x) v) { _x = v; }
            void y(typeof(_y) v) { _y = v; }
        }
        
        import std.stdio: writeln;
        
        void main(string[] args) {
        
            float a = 4;
            float b = 6;
        
            MVR!(a,b) mvr;
            mvr.x = 10;
            mvr.y = 20;
        
            writeln(a," ",b);
        return; 
        

        Currently, you can’t define a tuple of aliases and/or tuples can’t contain aliases: It’s been listed as a bug 3072: http://d.puremagic.com/issues/show_bug.cgi?id=3072

  3. That MRV idea doesn’t look very usable compared to the ones in languages that actually have first class tuples..

    E.g.

    val (a,b) = (4.0, 6.0) // Scala
    (a, b) = (4.0, 6.0) // Python
    let (a,b) = (4.0, 6.0) // Haskell

    @Robert’s .. syntax proposal. Isn’t .. already used to represent ranges in D2. FWIW, in mathematics ranges are exactly what that syntax stands for. The (a1, …, an) syntax for tuples is a standard. Almost nobody uses the C style sequencing operator. It should be deprecated.

    • I was pointing out how MRV can be done in D today, which given that D doesn’t natively support it is very cool. It also means that you can use syntactic sugar and not a major feature addition to implement it. i.e.

      int..int foo() { return 1..10; }
      void main(string[] args) {
          int a,b;
          a..b = foo; // => MVR!(a,b) = foo();
          writeln(a," ",b);
      }
      

      And yes ‘..’ is currently used for slicing, but in order to support matrices, etc. (currently only 1D slicing works, mixed slicing and indexing is not possible), ‘..’ has to become syntactic sugar for something else, i.e. T[2] or tuple(a,b).
      I still use the C style sequencing operator a lot in complex for loops and in variable declarations. I understand it’s also very important internally to the compiler as it’s involved in a bunch of source code transforms. There’s also the principal in D that ported C code should either fail or behave the same, i.e. no silently producing different results. Though I don’t know if this would effect the (,) syntax.

  4. I mean how hard it is to imagine what the built-in tuple would look like?

    The data type (a1, …, an) can be encoded with
    struct anontuple_1 {
    field_type_1 a1;

    field_type_n an;
    }

    The compiler internally stores the type information in order to allow things like assignments between tuples.

    Then e.g.
    a = (1,2)
    a._0 or a(0) returns 1. The compiler can transform this to anontuple_t1_instance.a1. There are dozens of languages that have these kind of data types. It shouldn’t be too hard to clone the behavior, right?!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: