Hot Chips 2020 Live Blog: Manticore 4096-core RISC-V (3:30pm PT)
by Dr. Ian Cutress on August 18, 2020 6:30 PM EST- Posted in
- AI
- Live Blog
- RISC-V
- Hot Chips 32
- Manticore
06:35PM EDT - Who wants all the RISC-V cores?!?
06:36PM EDT - Ever growing demand for compute
06:36PM EDT - Energy efficiency is critical
06:37PM EDT - lots of CPUs burn power on superfluous elements of out-of-order
06:38PM EDT - Maximise computer datapath with respect to control
06:38PM EDT - Now for Manticore
06:38PM EDT - 220mm2 per chip
06:38PM EDT - (estimated in 22FDX GloFo)
06:38PM EDT - Four chiplets
06:39PM EDT - die-to-die serial link to each other die
06:39PM EDT - 8 GB HBM2 per die private to that die
06:40PM EDT - Four quadrants of 32 clusters per chiplet
06:40PM EDT - Clusters can do 64 TB/s with each other
06:41PM EDT - 4x L1 quadrants share an L1 cache
06:41PM EDT - Bandwidth thinning scheme to optimize bandwidth to HBM without affecting floorplan
06:41PM EDT - Support a lot of cluster-to-cluster traffic
06:42PM EDT - Each compute cluster has 8 RV32G Snitch cores
06:42PM EDT - Each core has a multi-format SIMD compute unit
06:42PM EDT - supports half-precision bfloat, FP8
06:42PM EDT - Custom ISA extensions
06:44PM EDT - Goal was to maximize compute/control die area ratio
06:44PM EDT - Async with DMA Engine
06:44PM EDT - XSSR - Stream semantic registers
06:44PM EDT - Turn register read/writes into implicit memory load/stores
06:45PM EDT - increases FPU/ALU from 3x-5x
06:46PM EDT - Extension in the core register file
06:47PM EDT - Latency tolerant approach
06:47PM EDT - XFREP - Floating Point Repetition Buffer (programmable micro-loop buffer)
06:48PM EDT - custom instruction indicates start of hardware loop block
06:48PM EDT - 'Psuedo-dual issue' as integer core can work at the same time
06:49PM EDT - SSRs only work on float-only hardware loops
06:49PM EDT - FREP marks the loop
06:50PM EDT - For example, reduction!
06:52PM EDT - single-issue core can saturate an FPU
06:52PM EDT - IPC > 1
06:52PM EDT - FREP acts as instruction amplifier
06:53PM EDT - increased utilization for matmul and dotproduct that might be memory bound
06:54PM EDT - Up to 80 DP GFLOPs/W per cluster
06:55PM EDT - Close tracking of roofline model
06:56PM EDT - 9mm2 prototype made
06:56PM EDT - 22nm FDX
06:56PM EDT - Forward Body Biasing
06:56PM EDT - This is only a prototype small core of chiplet
06:57PM EDT - Snitch cores used for DVFS and IO management
06:58PM EDT - Full 4096 core system expected 27 DP Flops/sec
07:00PM EDT - In max perf mode, competitive vs A100 FP64
07:02PM EDT - snitch inside
07:05PM EDT - Q&A time
07:06PM EDT - Q: how does the compiler target the new instrutcions? A: Loop detection to promote loops that have the required characteristics. Might not always hit all cases - so go down QDNN, offer optimized low level kernels that frameworks would support
07:07PM EDT - Q: Productization? A: Concept so far to explore the key components. Wanted lean and mean RISC-V cores. Still missing the key components at SoC level, such as interconnects, which as a university is hard to come by. Looking into to generating and taping out later system in a research concept in the future.
07:08PM EDT - That's a wrap. Short break until the next sesstion, at half-past. Baidu + Alibaba NPUs
7 Comments
View All Comments
jchang6 - Tuesday, August 18, 2020 - link
any mention of memory latency?evilpaul666 - Wednesday, August 19, 2020 - link
Typo? "06:58PM EDT - Full 4096 core system expected 27 DP Flops/sec."TomWomack - Wednesday, August 19, 2020 - link
And around we go - those SSRs look a lot like a classic vector machine from the Cray era, I look forward to version 2 where you can use an SSR as a destination rather than having them used only for reduction. Maybe there is a technology point where using full-strength instruction decoders and out-of-order issue to run simple vector loops is no longer the right answer ...nandnandnand - Wednesday, August 19, 2020 - link
Some of these bot commenters are becoming self-aware.Spunjji - Thursday, August 20, 2020 - link
This made me think of Sony's concept behind Cell, only scaled up to an extreme extent (and not a solution being applied to the wrong problem). Kinda cool!Alan123 - Monday, August 24, 2020 - link
Hey, I wanted to give something real basic to help with anyone that wakes up with neck pain maybehttps://www.kickstarter.com/projects/676962738/the...
My recommendation to all of you is to research the best pillow for you so you don't have to wake up with any kind of pain and your spinal cord will thank you!
That's if you haven't done any research on it...I woke up with neck pain and I was like wtf!?!?.. oh..hello..my stinky stupid feathered pillow is the culprit lol
This might sound simple but...this could be seriously life changing...isn't it so funny.. a simple pillow can change the wellness of your life? lol
HOPE THIS HELPS YOU GUYS.. IM OFF TO DO MORE RESEARCH!!!!!!!!!!
Unashamed_unoriginal_username_x86 - Saturday, March 27, 2021 - link
Eat my dongus you nerd