Google Assistant is coming to lots more gadgets this year

At the IFA expo in Germany, Google announced that it’s letting third-party hardware makers bake its AI-powered Assistant into their gadgets and home appliances, which means you’ll be soon be able to control all kinds of devices just by saying, “OK Google…” around the house.

Assistant currently lives on mobile devices and the Google Home speaker; breaking it out of the company’s ecosystem of software and hardware makes it more widely accessible. It’ll be available on speakers from brands like Panasonic, Anker’s new Zolo label and Mobvoi (pictured above). As with Google Home, you’ll be able to play music, ask questions and control your other smart gadgets. However, these new products don’t mention support for voice calls like the Home speaker – so if they’re cheaper than Google’s own offering, that might be one of the big reasons why.

The search giant also revealed that Assistant commands will be supported by upcoming washers, dryers and vacuums from manufacturers like LG later this year.

That should help Assistant catch up to Amazon’s Alexa, which already works with third-party gear. Google is right to be upping its game now, as we just learned that Alexa and Microsoft’s Cortana will soon be able to communicate with each other; Apple is also slated to get into the smart speaker game in December with its HomePod.

Expect to hear about more smart gadgets from IFA, and plenty of ranting about too many assistants, a lack of privacy and unavailability of these services in India from yours truly in the coming days.

Las Vegas Apple stores prepare for new gadget reveal

Shoppers take advantage of a mild afternoon to shop at the Apple Summerlin store at Downtown Summerlin shopping center on Tuesday, Feb. 28, 2017, in Las Vegas. (Benjamin Hager/Las Vegas Review-Jou ...

Apple is slated to release several new devices and gadgets Tuesday — the iPhone’s 10th anniversary.

The tech giant is hosting what a company spokeswoman called a “special event” at 10 a.m. at its recently constructed Steve Jobs Theater inside the Apple Park campus in Cupertino, California. At least three Las Vegas Valley Apple stores will live stream the event.

“We always encourage our customers with all the excitement to come down to the store if they want to watch the event live,” said Rachel Johnson, an Apple Store manager at Town Square.

Johnson said the store will live stream the event on an Apple TV displayed inside the store, and said she has “no idea” what to expect.

A huge leak over the weekend to a handful of Apple blogs — including 9to5Mac and MacRumors — though, suggests the tech giant will release three phones Tuesday: the iPhone 8, iPhone 8 Plus and iPhone X, with prices starting around $1,ooo. The new phones are reported to be equipped with features like wireless charging, more camera options and a facial recognition system.

“We don’t know what exactly what they’ll be announcing,” she said. “We don’t know anything at the store level until the announcement actually happens.”

People have come to watch live streams of Apple announcements in the past, she said.

Ryan Manzon, an Apple Store specialist at the Fashion Show mall, and an employee at the Forum Shops also confirmed the stores will have a live stream.

An Apple spokeswoman did not address questions about other ways in which local stores prepare for the event, or questions about the new gadgets.

“You’ll have to wait for the surprise and delight!” she said in an email.

TechInsights Confirms Apple’s A10X SoC Is TSMC 10nm FF; 96.4mm2 Die Size

One of the more intriguing mysteries in the Apple ecosystem has been the question over what process the company would use for the A10X SoC, which is being used in the newly launched 2017 iPad Pro family. Whereas the A10 used in the iPhone was much too early to use anything but 16nm/14nm, the iPad Pro and A10X is coming in the middle of the transition point for high-end SoCs. 16nm is still a high performance process, but if a company pushes the envelope, 10nm is available. So what would Apple do?

The answer, as it turns out, is that they’ve opted to push the envelope. The intrepid crew over at TechInsightshas finally dissected an A10X and posted their findings, giving us our first in-depth look at the SoC. Most importantly then, TechInsights is confirming that the chip has been fabbed on TSMC’s new 10nm FinFET process. In fact, the A10X is the first TSMC 10nm chip to show up in a consumer device, a very interesting turn of events since that wasn’t what various production roadmaps called for (that honor would have gone to MediaTek’s Helio X30)


Image Courtesy TechInsights

Apple is of course known for pushing the envelope on chip design and fabrication; they have the resources to take risks, and the profit margins to cover them should they not pan out. Still, that the A10X is the first 10nm SoC is an especially interesting development because it’s such a high-end part. Traditionally, smaller and cheaper parts are the first out the door as these are less impacted by the inevitable yield and capacity challenges of an early manufacturing node. Instead, Apple seems to have gone relatively big with what amounts to their 10nm pipecleaner part.

I say “relatively big” here because while the A10X is a powerful part, and big for a 10nm SoC, in terms of absolute die size it’s not all that big of a chip. In fact by Apple X-series SoC standards, it’s downright small: just 96.4mm2. This is 24% smaller than the 16nm A10 SoC (125mm2), and in fact is even 9% smaller than the A9 SoC (104.5mm2). So not only is it smaller than any of Apple’s 16nm SoCs, but it’s also about 20% smaller than the next-smaller X-series SoC, the A6X. Or, if you want to compare it to the previous A9X, Apple’s achieved a 34% reduction in die size. In other words, Apple has never made an iPad SoC this small before.

One key difference here however is that the X-series SoCs have never before been the leading part for a new process node. It has always been iPhone SoCs that have lead the charge – A9 at 16nm, A8 at 20nm, A7 at 28nm, etc. This does mean that as a pipecleaner part, Apple does need to be especially mindful of the risks. If an X-series SoC is to lead the charge for the 10nm generation, then it can’t be allowed to be too big. Not that this has stopped Apple from packing in three CPU cores and a 12-cluster GPU design.

Speaking of size, TechInsights’ estimates for area scaling are quite interesting. Based on their accounting, they believe that Apple has achieved a 45% reduction in feature size versus 16nm, which is consistent with a full node’s improvement. This is consistent with TSMC’s earlier statements, but given the challenges involved in bringing newer processes to market, it’s none the less exciting to actually see it happening. For chip vendors designing products against 10nm and its 7nm sibling, this is good news, as small die sizes are the rule for pretty much everyone besides Apple.

A10X Architecture: A10 Enlarged

Diving a bit deeper, perhaps the biggest reason that A10X is as small as it is, is that Apple seems to have opted to be conservative with its design. Which again, for a pipecleaner part, is what you’d want to do.

Apple SoC Comparison
A10X A9X A8X A6X
CPU 3x Fusion
(Hurricane + Zephyr)
2x Twister 3x Typhoon 2x Swift
CPU Clockspeed ~2.36GHz 2.26GHz 1.5GHz 1.3GHz
GPU 12 Cluster GPU PVR 12 Cluster Series7 Apple/PVR GXA6850 PVR SGX554 MP4
Typical RAM 4GB LPDDR4 4GB LPDDR4 2GB LPDDR3 1GB LPDDR2
Memory Bus Width 128-bit 128-bit 128-bit 128-bit
Memory Bandwidth TBD 51.2GB/sec 25.6GB/sec 17.1GB/sec
L2 Cache 8MB 3MB 2MB 1MB
L3 Cache None None 4MB N/A
Manufacturing Process TSMC 10nm FinFET TSMC 16nm FinFET TSMC 20nm Samsung 32nm

We know from Apple’s official specifications that the A10X has 3 Fusion CPU core pairs, up from 2 pairs on A10, and 2 Twister CPU cores on A9X, all with 8MB of L2 cache tied to the CPU. Meanwhile the GPU in A10X is relatively unchanged; A9X shipped with a 12 cluster GPU design, and so does A10X. This means that Apple hasn’t invested their die space gains from 10nm in much of the way of additional hardware. To be sure, it’s not just a smaller A9X, but it’s also not the same kind of generational leap that we saw from A8X to A9X or similar iterations.

Unfortunately TechInsights’ public die shot release isn’t quite big enough or clean enough to draw a detailed floorplan from, but at a very high level we can make out the 12 GPU clusters on the left, along with the CPU cores to the right. Significantly, there aren’t any real surprises here. TechInsights heavily compares it to the A9X and there’s good reason to do so. IP blocks have been updated, but the only major change is the CPU cores, and those don’t take up a lot of die space relative to the GPU cores. This is what allows A10X to be more powerful than A9X while enjoying such a significant die size decrease.

As for the GPU in particular, Apple these days is no longer officially specifying whether they’re using Imagination’s PowerVR architecture in their chips. Furthermore we know that Apple is developing their own GPU, independent from Imagination’s designs, and that it will be rolled out sooner than later. With that said, even prior to today’s die shot release it’s been rather clear that A10X is not that GPU, and the die shot further proves that.

Apple’s developer documentation has lumped in the A10X’s GPU with the rest of the iOS GPU Family 3, which comprises all of the A9 and A10 family SoCs. So from a feature-set perspective, A10X’s GPU isn’t bringing anything new to the table. As for the die shot, as TechInsights correctly notes, the GPU clusters in the A10X look almost exactly like the A9X’s clusters (and the A10’s, for that matter), further indicating it’s the same base design.


Image Courtesy TechInsights

Ultimately what this means is that in terms of design and features, A10X is relatively straightforward. It’s a proper pipecleaner product for a new process, and one that is geared to take full advantage of the die space savings as opposed to spending those savings on new features/transistors.

Otherwise I am very curious as to just what this means for power consumption – is Apple gaining much there, or is it all area gains? A10X’s CPU clockspeed is only marginally higher than A9X’s, and pretty much identical to A10, so we can see that Apple hasn’t gained much in the way of clockspeeds. So does that mean that Apple instead invested any process-related gains in reducing power consumption, or, as some theories go, has 10nm not significantly improved on power consumption versus 16nm? But the answer to that will have to wait for another day.

Huawei Mate 10 and Mate 10 Pro Launch on October 16th, More Kirin 970 Details

Riding on the back of the ‘not-announced then announced’ initial set of Kirin 970 details, Huawei had one of the major keynote presentations at the IFA trade show this year, detailing more of the new SoC, more into the AI details, and also providing some salient information about the next flagship phone. Richard Yu, CEO of Huawei’s Consumer Business Group (CBG), announced that the Huawei Mate 10 and Mate 10 Pro will be launched on October 16th, at an event in Munich, and will feature both the Kirin 970 SoC and a new minimal-bezel display.


Kirin 970 PCB vs Intel Core i7 Laptop Sticker

Suffice to say, that is basically all we know about the Mate 10 at this point: a new display technology, and a new SoC with additional AI hardware under-the-hood to start the process of using AI to enhance the experience. When speaking with both Clement Wong, VP of Global Marketing at Huawei, and Christophe Coutelle, Director of Device Software at Huawei, it was clear that they have large, but progressive goals for the direction of AI. The initial steps demonstrated were to assist in providing the best camera settings for a scene by identifying the objects within them – a process that can be accelerated by AI and consume less power. The two from Huawei were also keen to probe the press and attendees at the show about what they thought of AI, and in particular the functions it could be applied to. One of the issues of developing hardware specifically for AI is not really the hardware itself, but the software that uses it.

The Neural Processing Unit (NPU) in the Kirin 970 is using IP from Cambricon Technology (thanks to jjj for the tip, we confirmed it). In speaking with Eric Zhou, Platform Manager for HiSilicon, we learned that the licensing for the IP is different to the licensing agreements in place with, say ARM. Huawei uses ARM core licenses for their chips, which restricts what Huawei can change in the core design: essentially you pay to use ARM’s silicon floorplan / RTL and the option is only one of placement on the die (along with voltage/frequency). With Cambricon, the agreement around the NPU IP is a more of a joint collaboration – both sides helped progress the IP beyond the paper stage with updates and enhancements all the way to final 10nm TSMC silicon.

We learned that the IP is scalable, but at this time is only going to be limited to Huawei devices. The configuration of the NPU internally is based on multiple matrix multiply units, similar to that shown in Google’s TPU and NVIDIA’s Tensor core, found in Volta. In Google’s first TPU, designed for neural network training, there was a single 256×256 matrix multiply unit doing the heavy lifting. For the TPUv2, as detailed back at the Hot Chips conference a couple of weeks ago, Google has moved to dual 128×128 matrix multiply units. In NVIDIA’s biggest Volta chip, the V100, they have placed 640 tensor cores each capable of a 4×4 matrix multiply. The Kirin 970 TPU by contrast, as we were told, uses 3×3 matrix multiply units and a number of them, although that number was not provided.

One other element to the NPU that was interesting was that its performance was quoted in terms of 16-bit floating point accuracy. When compared to the other chips listed above, Google’s TPU works best with 8-bit integer math, while Nvidia’s Tensor Core does 16-bit floating point as well. When asked, Eric stated that at this time, FP16 implementation was preferred although that might change, depending on how the hardware is used. As an initial implementation, FP16 was more inclusive of different frameworks and trained algorithms, especially as the NPU is an inference-only design.

At the keynote, and confirmed in our discussions after, Huawei stated that the API to use the NPU will be available for developers. The unit as a whole will support the TensorFlow and TensorFlow Lite frameworks, as well as Caffe and Caffe2. The NPU can be accessed via Huawei’s own Kirin AI API, or Android’s NN API, relying on Kirin’s AI Heterogeneous Resource Management tools to split the workloads between CPU, GPU, DSP and NPU. I suspect we’ll understand more about this nearer to the launch. Huawei did specifically state that this will be an ‘open architecture’, but failed to mention exactly what that meant in this context.

The Kirin 970 will be available on a development board/platform for other engineers and app developers in early Q1, similar to how the Kirin 960 was also available. This will also include a community, support, dedicated tool chains and a driver development kit.

We did learn that the NPU is the size of ‘half a core’, although it was hard to tell if this was ‘half of a single core (an A73 or an A53)’ or ‘half of the cores (all the cores put together)’. We did confirm that the die size is under 100mm2, although an exact number was not provided. It does give a transistor density of 55 million transistors per square mm, which is double what we see on AMD’s Ryzen CPU (25m per mm2) on GloFo 14nm vs TSMC 10nm. We were told that the NPU has its own power domain, and can be both frequency gated and power gated, although during normal operation it will only be frequency gated to improve response time from idle to wake up. Power consumption was not explicitly stated (‘under 1W’), but they did quote that a test of 1000 images being recognized drained a 4000 mAh battery by 0.19%, fluctuating between 0.25W and 0.67W.

We did draw a few more specifications on the Kirin 970 out of senior management unrelated to the NPU. The display controller can support a maximum screen size of 4K, and the Kirin 970 will support two SIM cards at 4G speeds at the same time, using a time mux strategy. While the model is rated for Category 18 for downloads, giving 1.2 Gbps with 3x carrier aggregation, 4×4 MIMO and 256-QAM, the chip will do Category 13 downloads (up to 150 Mbps). The chip can handle VoLTE on both SIMs as well. Band support is substantial, given in the list below.

Audio is an odd one out here, with the onboard audio rated to 32-bit and 384 kHz (although SNR will depend on the codec). That’s about 12-15 bits higher than needed and easily multiple times the human sampling rate, but high numbers are seemingly required. The storage was confirmed as UFS 2.1, with LPDDR4X-1833 for the memory, and the use of a new i7 sensor hub.

HiSilicon High-End Kirin SoC Lineup
SoC Kirin 970 Kirin 960 Kirin 950/955
CPU 4x A73 @ 2.40 GHz
4x A53 @ 1.80 GHz
4x A73 @ 2.36GHz
4x A53 @ 1.84GHz
4x A72 @ 2.30/2.52GHz
4x A53 @ 1.81GHz
GPU ARM Mali-G72MP12
? MHz
ARM Mali-G71MP8
1037MHz
ARM Mali-T880MP4
900MHz
LPDDR4
Memory
2x 32-bit
LPDDR4 @ 1833 MHz
2x 32-bit
LPDDR4 @ 1866MHz
29.9GB/s
2x 32-bit
LPDDR4 @ 1333MHz 21.3GB/s
Interconnect ARM CCI ARM CCI-550 ARM CCI-400
Storage UFS 2.1 UFS 2.1 eMMC 5.0
ISP/Camera Dual 14-bit ISP Dual 14-bit ISP
(Improved)
Dual 14-bit ISP
940MP/s
Encode/Decode 2160p60 Decode
2160p30 Encode
2160p30 HEVC & H.264
Decode & Encode

2160p60 HEVC
Decode1080p H.264
Decode & Encode

2160p30 HEVC
DecodeIntegrated ModemKirin 970 Integrated LTE
(Category 18)
DL = 1200 Mbps
3x20MHz CA, 256-QAM
UL = 150 Mbps
2x20MHz CA, 64-QAMKirin 960 Integrated LTE
(Category 12/13)
DL = 600Mbps
4x20MHz CA, 64-QAM
UL = 150Mbps
2x20MHz CA, 64-QAMBalong Integrated LTE
(Category 6)
DL = 300Mbps
2x20MHz CA, 64-QAM
UL = 50Mbps
1x20MHz CA, 16-QAMSensor Hubi7i6i5NPUYesNoNoMfc. ProcessTSMC 10nmTSMC 16nm FFCTSMC 16nm FF+